
Security News
Vite Releases Technical Preview of Rolldown-Vite, a Rust-Based Bundler
Vite releases Rolldown-Vite, a Rust-based bundler preview offering faster builds and lower memory usage as a drop-in replacement for Vite.
π Website | π€ Hugging Face | π¬ Discord | π X (Twitter) | π° Blog
OuteTTS supports the following backends:
Backend | Type | Installation |
---|---|---|
Llama.cpp Python Bindings | Python | β Installed by default |
Llama.cpp Server | Python | β Installed by default |
Llama.cpp Server Async (Batched) | Python | β Installed by default |
Hugging Face Transformers | Python | β Installed by default |
ExLlamaV2 & ExLlamaV2 Async (Batched) | Python | β Requires manual installation |
VLLM (Batched) Experimental support | Python | β Requires manual installation |
Transformers.js | JavaScript | NPM package |
Llama.cpp Directly | C++ | External library |
Tested with NVIDIA L40S GPU
OuteTTS now installs the llama.cpp Python bindings by default. Therefore, you must specify the installation based on your hardware. For more detailed instructions on building llama.cpp, refer to the following resources: llama.cpp Build and llama.cpp Python
pip install outetts --upgrade
CMAKE_ARGS="-DGGML_CUDA=on" pip install outetts --upgrade
CMAKE_ARGS="-DGGML_HIPBLAS=on" pip install outetts --upgrade
CMAKE_ARGS="-DGGML_VULKAN=on" pip install outetts --upgrade
CMAKE_ARGS="-DGGML_METAL=on" pip install outetts --upgrade
For a complete usage guide, refer to the interface documentation here:
[!TIP] Currently, only one default English voice is available for testing.
You can easily create your own speaker profiles in just a few lines by following this guide:
import outetts
# Initialize the interface
interface = outetts.Interface(
config=outetts.ModelConfig.auto_config(
model=outetts.Models.VERSION_1_0_SIZE_1B,
# For llama.cpp backend
backend=outetts.Backend.LLAMACPP,
quantization=outetts.LlamaCppQuantization.FP16
# For transformers backend
# backend=outetts.Backend.HF,
)
)
# Load the default speaker profile
speaker = interface.load_default_speaker("EN-FEMALE-1-NEUTRAL")
# Or create your own speaker profiles in seconds and reuse them instantly
# speaker = interface.create_speaker("path/to/audio.wav")
# interface.save_speaker(speaker, "speaker.json")
# speaker = interface.load_speaker("speaker.json")
# Generate speech
output = interface.generate(
config=outetts.GenerationConfig(
text="Hello, how are you doing?",
speaker=speaker,
)
)
# Save to file
output.save("output.wav")
[!IMPORTANT] Important Sampling Considerations
When using OuteTTS version 1.0, it is crucial to use the settings specified in the Sampling Configuration section. The repetition penalty implementation is particularly important - this model requires penalization applied to a 64-token recent window, rather than across the entire context window. Penalizing the entire context will cause the model to produce broken or low-quality output.
To address this limitation, all necessary samplers and patches for all backends are set up automatically in the outetts library. If using a custom implementation, ensure you correctly implement these requirements.
The model is designed to be used with a speaker reference. Without one, it generates random vocal characteristics, often leading to lower-quality outputs. The model inherits the referenced speaker's emotion, style, and accent. Therefore, when transcribing to other languages with the same speaker, you may observe the model retaining the original accent. For example, if you use a Japanese speaker and continue speech in English, the model may tend to use a Japanese accent.
It is recommended to create a speaker profile in the language you intend to use. This helps achieve the best results in that specific language, including tone, accent, and linguistic features.
While the model supports cross-lingual speech, it still relies on the reference speaker. If the speaker has a distinct accentβsuch as British Englishβother languages may carry that accent as well.
Testing shows that a temperature of 0.4 is an ideal starting point for accuracy (with the sampling settings below). However, some voice references may benefit from higher temperatures for enhanced expressiveness or slightly lower temperatures for more precise voice replication.
If the cloned voice quality is subpar, check the encoded speaker sample.
interface.decode_and_save_speaker(speaker=your_speaker, path="speaker.wav")
The DAC audio reconstruction model is lossy, and samples with clipping, excessive loudness, or unusual vocal features may introduce encoding issues that impact output quality.
For optimal results with this TTS model, use the following sampling settings.
Parameter | Value |
---|---|
Temperature | 0.4 |
Repetition Penalty | 1.1 |
Repetition Range | 64 |
Top-k | 40 |
Top-p | 0.9 |
Min-p | 0.05 |
FAQs
OuteAI Text-to-Speech (TTS)
We found that outetts demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago.Β It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Vite releases Rolldown-Vite, a Rust-based bundler preview offering faster builds and lower memory usage as a drop-in replacement for Vite.
Research
Security News
A malicious npm typosquat uses remote commands to silently delete entire project directories after a single mistyped install.
Research
Security News
Malicious PyPI package semantic-types steals Solana private keys via transitive dependency installs using monkey patching and blockchain exfiltration.