
Research
Security News
The Growing Risk of Malicious Browser Extensions
Socket researchers uncover how browser extensions in trusted stores are used to hijack sessions, redirect traffic, and manipulate user behavior.
Made in Vancouver, Canada by Picovoice
Orca is an on-device streaming text-to-speech engine that is designed for use with LLMs, enabling zero-latency voice assistants. Orca is:
pip3 install pvorca
Orca requires a valid Picovoice AccessKey
at initialization. AccessKey
acts as your credentials when using Orca
SDKs. You can get your AccessKey
for free. Make sure to keep your AccessKey
secret.
Signup or Login to Picovoice Console to get your AccessKey
.
Orca supports two modes of operation: streaming and single synthesis. In the streaming synthesis mode, Orca processes an incoming text stream in real-time and generates audio in parallel. In the single synthesis mode, a complete text is synthesized in a single call to the Orca engine.
Create an instance of the Orca engine:
import pvorca
orca = pvorca.create(access_key='${ACCESS_KEY}')
Replace the ${ACCESS_KEY}
with your AccessKey obtained from Picovoice Console.
To synthesize a text stream, create an Orca.OrcaStream
object and add text to it one-by-one:
stream = orca.stream_open()
for text_chunk in text_generator():
pcm = stream.synthesize(text_chunk)
if pcm is not None:
# handle pcm
pcm = stream.flush()
if pcm is not None:
# handle pcm
The text_generator()
function can be any stream generating text, for example an LLM response.
Orca produces audio chunks in parallel to the incoming text stream, and returns the raw PCM whenever enough context has
been added via stream.synthesize()
.
To ensure smooth transitions between chunks, the stream.synthesize()
function returns an audio chunk that only
includes the audio for a portion of the text that has been added.
To generate the audio for the remaining text, stream.flush()
needs to be invoked.
When done with streaming text synthesis, the Orca.OrcaStream
object needs to be closed:
stream.close()
If the complete text is known before synthesis, single synthesis mode can be used to generate speech in a single call to Orca:
# Return raw PCM
pcm, alignments = orca.synthesize(text='${TEXT}')
# Save the generated audio to a WAV file directly
alignments = orca.synthesize_to_file(text='${TEXT}', path='${OUTPUT_PATH}')
Replace ${TEXT}
with the text to be synthesized and ${OUTPUT_PATH}
with the path to save the generated audio as a
single-channel 16-bit PCM WAV file.
In single synthesis mode, Orca returns metadata of the synthesized audio in the form of a list of Orca.WordAlignment
objects.
You can print the metadata with:
for token in alignments:
print(f"word=\"{token.word}\", start_sec={token.start_sec:.2f}, end_sec={token.end_sec:.2f}")
for phoneme in token.phonemes:
print(f"\tphoneme=\"{phoneme.phoneme}\", start_sec={phoneme.start_sec:.2f}, end_sec={phoneme.end_sec:.2f}")
When done make sure to explicitly release the resources using:
orca.delete()
Orca supports a wide range of English characters, including letters, numbers, symbols, and punctuation marks.
You can get a list of all supported characters with the .valid_characters
property.
Pronunciations of characters or words not supported by this list can be achieved with
custom pronunciations.
Orca allows to embed custom pronunciations in the text via the syntax: {word|pronunciation}
.
The pronunciation is expressed in ARPAbet phonemes, for example:
Orca Streaming Text-to-Speech can synthesize speech in different languages and with a variety of voices, each of which is characterized by a model file (.pv
) located in lib/common. The language and gender of the speaker is indicated in the file name.
To create an instance of the engine with a specific language and voice, use:
orca = pvorca.create(access_key='${ACCESS_KEY}', model_path='${MODEL_PATH}')
and replace ${MODEL_PATH}
with the path to the model file with the desired language/voice.
Orca allows for keyword arguments to control the synthesized speech. They can be provided to the stream_open
method or the single synthesis methods synthesize
and synthesize_to_file
:
speech_rate
: Controls the speed of the generated speech. Valid values are within [0.7, 1.3]. A higher (lower) value
produces speech that is faster (slower). The default is 1.0
.random_state
: Sets the random state for sampling during synthesis. This can be used to ensure that the synthesized
speech is deterministic across different runs. Valid values are all non-negative integers. If not provided, a random
seed will be chosen and the synthesis process will be non-deterministic.To obtain the set of valid characters, call orca.valid_characters
.
To retrieve the maximum number of characters allowed, call orca.max_character_limit
.
The sample rate of Orca is orca.sample_rate
.
Along with the raw PCM or saved audio file, Orca returns metadata for the synthesized audio in single synthesis mode.
The Orca.WordAlignment
object has the following properties:
Orca.PhonemeAlignment
objects.The Orca.PhonemeAlignment
object has the following properties:
pvorcademo provides command-line utilities for synthesizing audio using Orca.
FAQs
Orca Streaming Text-to-Speech Engine
We found that pvorca demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 0 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket researchers uncover how browser extensions in trusted stores are used to hijack sessions, redirect traffic, and manipulate user behavior.
Research
Security News
An in-depth analysis of credential stealers, crypto drainers, cryptojackers, and clipboard hijackers abusing open source package registries to compromise Web3 development environments.
Security News
pnpm 10.12.1 introduces a global virtual store for faster installs and new options for managing dependencies with version catalogs.