
Security News
Attackers Are Hunting High-Impact Node.js Maintainers in a Coordinated Social Engineering Campaign
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.
SpeechCraft
Advanced tools
Create natural sounding audio from text, clone voices and use them. Convert voice to voice. Bark model.
Ever wanted to create natural sounding speech from text, clone a voice or sound like someone else? SpeechCraft is ideal for creating voice-overs, audiobooks, or just having fun.
Also check-out other socaity projects for generative AI:
https://github.com/SocAIty/SpeechCraft/assets/7961324/dbf905ea-df37-4e52-9e93-a9833352459d
The hermine voice was generated with the voice_clone_test_voice_1.wav file with around 11 seconds of clear speech.
https://github.com/SocAIty/SpeechCraft/assets/7961324/71a039c7-e665-4576-91c7-729052e05b03
Speechcraft is available on socaity.ai as part of the socaity sdk Spare yourself the installation and use the sdk directly. NO GPU required.
from socaity import speechcraft
audio = speechcraft().text2voice("I love society [laughs]! [happy] What a day to make voice overs with artificial intelligence.").get_result()
audio.save("i_love_socaity.wav")
# from PyPi (without web API)
pip install speechcraft
# with web API
pip install speechcraft[full]
# or from GitHub for the newest version.
pip install git+https://github.com/SocAIty/speechcraft
To use a GPU don't forget to install pytorch GPU with your correct cuda version. For example:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Requirements:
If you have PyPi >= 25.0.0 you will get an install error based on incompatible OmegaConf package syntax.
python -m venv venv and activate it with venv/Scripts/activate.pip install .We provide three ways to use the text-to-speech functionality.
from speechcraft import text2voice, voice2embedding, voice2voice
# simple text2speech synthesis
text = "I love society [laughs]! [happy] What a day to make voice overs with artificial intelligence."
audio_numpy, sample_rate = text2voice(text, speaker_name="en_speaker_3")
# speaker embedding generation
embedding = voice2embedding(audio_file="voice_sample_15s.wav", voice_name="hermine").save_to_speaker_lib()
# text2speech synthesis with cloned voice or embedding
audio_with_cloned_voice, sample_rate = text2voice(sample_text, voice=embedding) # also works with voice="hermine"
# voice2voice synthesis
cloned_audio = voice2voice(audio_file="my_audio_file.wav", voice_name_or_embedding_path="hermine")
Use the following code to convert and save the audio file with the media-toolkit module.
from media_toolkit import AudioFile
audio = AudioFile().from_np_array(audio_numpy, sr=sample_rate)
audio.save("my_new_audio.wav")
Note: The first time your are using speechcraft it will download the models. These files are quite big and can take a while to download.

The usage of the webservice is documented in WebService.md.
Bark has been tested and works on both CPU and GPU (pytorch 2.0+, CUDA 11.7 and CUDA 12.0).
On enterprise GPUs and PyTorch nightly, Bark can generate audio in roughly real-time. On older GPUs, default colab, or CPU, inference time might be significantly slower. For older GPUs or CPU you might want to consider using smaller models. Details can be found in out tutorial sections here.
The full version of Bark requires around 12GB of VRAM to hold everything on GPU at the same time.
To use a smaller version of the models, which should fit into 8GB VRAM, set the environment flag SUNO_USE_SMALL_MODELS=True.
If you don't have hardware available or if you want to play with bigger versions of our models, you can also sign up for early access to our model playground here.
Bark is fully generative text-to-audio model devolved for research and demo purposes. It follows a GPT style architecture similar to AudioLM and Vall-E and a quantized Audio representation from EnCodec. It is not a conventional TTS model, but instead a fully generative text-to-audio model capable of deviating in unexpected ways from any given script. Different to previous approaches, the input text prompt is converted directly to audio without the intermediate use of phonemes. It can therefore generalize to arbitrary instructions beyond speech such as music lyrics, sound effects or other non-speech sounds.
Below is a list of some known non-speech sounds, but we are finding more every day. Please let us know if you find patterns that work particularly well on Discord!
[laughter][laughs][sighs][music][gasps][clears throat]— or ... for hesitations♪ for song lyrics[MAN] and [WOMAN] to bias Bark toward male and female speakers, respectively| Language | Status |
|---|---|
| English (en) | ✅ |
| German (de) | ✅ |
| Spanish (es) | ✅ |
| French (fr) | ✅ |
| Hindi (hi) | ✅ |
| Italian (it) | ✅ |
| Japanese (ja) | ✅ |
| Korean (ko) | ✅ |
| Polish (pl) | ✅ |
| Portuguese (pt) | ✅ |
| Russian (ru) | ✅ |
| Turkish (tr) | ✅ |
| Chinese, simplified (zh) | ✅ |
To use a different language use the corresponding voice parameter to it like "de_speaker_1". You find preset voices and languages in the assets folder.
SpeechCraft and Bark is licensed under the MIT License.
Make sure these things are NOT in your voice input: (in no particular order)
What makes for good prompt audio? (in no particular order)
This repository is a merge of the orignal bark repository and bark-voice-cloning-HuBert-quantizer by gitmylo The credit goes to the original authors. Like the original authors, I am also not responsible for any misuse of this repository. Use at your own risk, and please act responsibly. Don't copy and publish the voice of a person without their consent.
Any help with maintaining and extending the package is welcome. Feel free to open an issue or a pull request.
FAQs
Create natural sounding audio from text, clone voices and use them. Convert voice to voice. Bark model.
We found that SpeechCraft demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.

Security News
Axios compromise traced to social engineering, showing how attacks on maintainers can bypass controls and expose the broader software supply chain.

Security News
Node.js has paused its bug bounty program after funding ended, removing payouts for vulnerability reports but keeping its security process unchanged.