
Company News
Socket Named Top Sales Organization by RepVue
Socket won two 2026 Reppy Awards from RepVue, ranking in the top 5% of all sales orgs. AE Alexandra Lister shares what it's like to grow a sales career here.
facebook/hf-seamless-m4t-medium
Advanced tools
SeamlessM4T is a collection of models designed to provide high quality translation, allowing people from different linguistic communities to communicate effortlessly through speech and text.
This repository hosts 🤗 Hugging Face's implementation of SeamlessM4T. You can find the original weights, as well as a guide on how to run them in the original hub repositories (large and medium checkpoints).
🌟 SeamlessM4T v2, an improved version of this version with a novel architecture, has been released here. This new model improves over SeamlessM4T v1 in quality as well as inference speed in speech generation tasks.
SeamlessM4T v2 is also supported by 🤗 Transformers, more on it in the model card of this new version or directly in 🤗 Transformers docs.
SeamlessM4T Medium covers:
This is the "medium" variant of the unified model, which enables multiple tasks without relying on multiple separate models:
You can perform all the above tasks from one single model, SeamlessM4TModel, but each task also has its own dedicated sub-model.
First, load the processor and a checkpoint of the model:
>>> from transformers import AutoProcessor, SeamlessM4TModel
>>> processor = AutoProcessor.from_pretrained("facebook/hf-seamless-m4t-medium")
>>> model = SeamlessM4TModel.from_pretrained("facebook/hf-seamless-m4t-medium")
You can seamlessly use this model on text or on audio, to generated either translated text or translated audio.
Here is how to use the processor to process text and audio:
>>> # let's load an audio sample from an Arabic speech corpus
>>> from datasets import load_dataset
>>> dataset = load_dataset("arabic_speech_corpus", split="test", streaming=True)
>>> audio_sample = next(iter(dataset))["audio"]
>>> # now, process it
>>> audio_inputs = processor(audios=audio_sample["array"], return_tensors="pt")
>>> # now, process some English test as well
>>> text_inputs = processor(text = "Hello, my dog is cute", src_lang="eng", return_tensors="pt")
SeamlessM4TModel can seamlessly generate text or speech with few or no changes. Let's target Russian voice translation:
>>> audio_array_from_text = model.generate(**text_inputs, tgt_lang="rus")[0].cpu().numpy().squeeze()
>>> audio_array_from_audio = model.generate(**audio_inputs, tgt_lang="rus")[0].cpu().numpy().squeeze()
With basically the same code, I've translated English text and Arabic speech to Russian speech samples.
Similarly, you can generate translated text from audio files or from text with the same model. You only have to pass generate_speech=False to SeamlessM4TModel.generate.
This time, let's translate to French.
>>> # from audio
>>> output_tokens = model.generate(**audio_inputs, tgt_lang="fra", generate_speech=False)
>>> translated_text_from_audio = processor.decode(output_tokens[0].tolist(), skip_special_tokens=True)
>>> # from text
>>> output_tokens = model.generate(**text_inputs, tgt_lang="fra", generate_speech=False)
>>> translated_text_from_text = processor.decode(output_tokens[0].tolist(), skip_special_tokens=True)
SeamlessM4TModel is transformers top level model to generate speech and text, but you can also use dedicated models that perform the task without additional components, thus reducing the memory footprint.
For example, you can replace the audio-to-audio generation snippet with the model dedicated to the S2ST task, the rest is exactly the same code:
>>> from transformers import SeamlessM4TForSpeechToSpeech
>>> model = SeamlessM4TForSpeechToSpeech.from_pretrained("facebook/hf-seamless-m4t-medium")
Or you can replace the text-to-text generation snippet with the model dedicated to the T2TT task, you only have to remove generate_speech=False.
>>> from transformers import SeamlessM4TForTextToText
>>> model = SeamlessM4TForTextToText.from_pretrained("facebook/hf-seamless-m4t-medium")
Feel free to try out SeamlessM4TForSpeechToText and SeamlessM4TForTextToSpeech as well.
You have the possibility to change the speaker used for speech synthesis with the spkr_id argument. Some spkr_id works better than other for some languages!
You can use different generation strategies for speech and text generation, e.g .generate(input_ids=input_ids, text_num_beams=4, speech_do_sample=True) which will successively perform beam-search decoding on the text model, and multinomial sampling on the speech model.
Use return_intermediate_token_ids=True with SeamlessM4TModel to return both speech and text !
FAQs
Unknown package
We found that facebook/hf-seamless-m4t-medium demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Company News
Socket won two 2026 Reppy Awards from RepVue, ranking in the top 5% of all sales orgs. AE Alexandra Lister shares what it's like to grow a sales career here.

Security News
NIST will stop enriching most CVEs under a new risk-based model, narrowing the NVD's scope as vulnerability submissions continue to surge.

Company News
/Security News
Socket is an initial recipient of OpenAI's Cybersecurity Grant Program, which commits $10M in API credits to defenders securing open source software.