
Security News
The Nightmare Before Deployment
Season’s greetings from Socket, and here’s to a calm end of year: clean dependencies, boring pipelines, no surprises.
omnilingual-asr
Advanced tools
Photographs captured during corpus creation efforts in Pakistan and Liberia.
Omnilingual ASR is an open-source speech recognition system supporting over 1,600 languages — including hundreds never previously covered by any ASR technology. Designed for broad accessibility, it enables new languages to be added with just a few paired examples without requiring specialized expertise or large datasets. By combining scalable zero-shot learning with a flexible model family, Omnilingual ASR aims to make speech technology more inclusive and adaptable for communities and researchers worldwide.
Our 7B-LLM-ASR system achieves state-of-the-art performance across 1,600+ languages, with character error rates (CER) below 10 for 78% of those languages.
We release two suites of models:
omniASR_{CTC,LLM}_{300M,1B,3B,7B}_v2).omniASR_LLM_Unlimited_{300M,1B,3B,7B}_v2). The unlimited audio length models are briefly described in the architecture overview section. It's accuracy is comparable to limited audio length models, however finetuning recipies for this model are currently not supported.The models were developed using fairseq2, a research-focused sequence modeling toolkit. While we provide a reference inference pipeline that works across platforms, audio support requires libsndfile (Mac: brew install libsndfile; Windows may need an additional setup).
# using pip
pip install omnilingual-asr
# using uv
uv add omnilingual-asr
from omnilingual_asr.models.inference.pipeline import ASRInferencePipeline
pipeline = ASRInferencePipeline(model_card="omniASR_LLM_Unlimited_7B_v2")
audio_files = ["/path/to/eng_audio1.flac", "/path/to/deu_audio2.wav"]
lang = ["eng_Latn", "deu_Latn"]
transcriptions = pipeline.transcribe(audio_files, lang=lang, batch_size=2)
More details on running specific models can be found in the src/omnilingual_asr/models/inference directory.
⚠️ Important: Currently only audio files shorter than 40 seconds are accepted for inference on CTC and LLM model suites.
To view the full list of 1600+ supported languages, you can access the language list programmatically:
from omnilingual_asr.models.wav2vec2_llama.lang_ids import supported_langs
# Print all supported languages
print(f"Total supported languages: {len(supported_langs)}")
print(supported_langs)
# Check if a specific language is supported
if "eng_Latn" in supported_langs:
print("English (Latin script) is supported!")
Languages follow the format {language_code}_{script}, for example eng_Latn - English (Latin script), cmn_Hans - Mandarin Chinese (Simplified), ...
We provide a large-scale multilingual speech dataset on HuggingFace under CC-BY-4.0 License: facebook/omnilingual-asr-corpus.
This dataset can be directly used with our inference pipeline for evaluation or testing:
pip install "omnilingual-asr[data]"
from datasets import load_dataset
from omnilingual_asr.models.inference.pipeline import ASRInferencePipeline
# Load dataset for a specific language (e.g., Ligurian)
omni_dataset = load_dataset("facebook/omnilingual-asr-corpus", "lij_Latn", split="train", streaming=True)
batch = next(omni_dataset.iter(5))
# Convert to pipeline input format
audio_data = [{"waveform": x["array"], "sample_rate": x["sampling_rate"]}
for x in batch["audio"]]
# Run inference
pipeline = ASRInferencePipeline(model_card="omniASR_LLM_7B_v2")
transcriptions = pipeline.transcribe(audio_data, batch_size=2)
# Display results
for i, (transcription, original_text) in enumerate(zip(transcriptions, batch["raw_text"]), 1):
print(f"\n Sample {i}:")
print(f" Ground Truth: {original_text}")
print(f" Predicted: {transcription}")
| Model Name | Features | Parameters | Download Size (FP32) | Inference VRAM¹ | Real-Time Factor¹ (relative speed)² |
|---|---|---|---|---|---|
omniASR_W2V_300M | SSL | 317_390_592 | 1.2 GiB | ||
omniASR_W2V_1B | SSL | 965_514_752 | 3.6 GiB | ||
omniASR_W2V_3B | SSL | 3_064_124_672 | 12.0 GiB | ||
omniASR_W2V_7B | SSL | 6_488_487_168 | 25.0 GiB | ||
omniASR_CTC_300M | ASR | 325_494_996 | 1.3 GiB | ~2 GiB | 0.001 (96x) |
omniASR_CTC_1B | ASR | 975_065_300 | 3.7 GiB | ~3 GiB | 0.002 (48x) |
omniASR_CTC_3B | ASR | 3_080_423_636 | 12.0 GiB | ~8 GiB | 0.003 (32x) |
omniASR_CTC_7B | ASR | 6_504_786_132 | 25.0 GiB | ~15 GiB | 0.006 (16x) |
omniASR_CTC_300M_v2 | ASR | 325_494_996 | 1.3 GiB | ~2 GiB | 0.001 (96x) |
omniASR_CTC_1B_v2 | ASR | 975_065_300 | 3.7 GiB | ~3 GiB | 0.002 (48x) |
omniASR_CTC_3B_v2 | ASR | 3_080_423_636 | 12.0 GiB | ~8 GiB | 0.003 (32x) |
omniASR_CTC_7B_v2 | ASR | 6_504_786_132 | 25.0 GiB | ~15 GiB | 0.006 (16x) |
omniASR_LLM_300M | ASR with optional language conditioning | 1_627_603_584 | 6.1 GiB | ~5 GiB | 0.090 (~1x) |
omniASR_LLM_1B | ASR with optional language conditioning | 2_275_710_592 | 8.5 GiB | ~6 GiB | 0.091 (~1x) |
omniASR_LLM_3B | ASR with optional language conditioning | 4_376_679_040 | 17.0 GiB | ~10 GiB | 0.093 (~1x) |
omniASR_LLM_7B | ASR with optional language conditioning | 7_801_041_536 | 30.0 GiB | ~17 GiB | 0.092 (~1x) |
omniASR_LLM_300M_v2 | ASR with optional language conditioning | 1_627_603_584 | 6.1 GiB | ~5 GiB | 0.090 (~1x) |
omniASR_LLM_1B_v2 | ASR with optional language conditioning | 2_275_710_592 | 8.5 GiB | ~6 GiB | 0.091 (~1x) |
omniASR_LLM_3B_v2 | ASR with optional language conditioning | 4_376_679_040 | 17.0 GiB | ~10 GiB | 0.093 (~1x) |
omniASR_LLM_7B_v2 | ASR with optional language conditioning | 7_801_041_536 | 30.0 GiB | ~17 GiB | 0.092 (~1x) |
omniASR_LLM_Unlimited_300M_v2 | omniASR_LLM_300M + unlimited audio length | 1_627_603_584 | 6.1 GiB | ~5 GiB | 0.092 (~1x) (0.206)³ |
omniASR_LLM_Unlimited_1B_v2 | omniASR_LLM_1B + unlimited audio length | 2_275_710_592 | 8.5 GiB | ~6 GiB | 0.097 (~1x) (0.207)³ |
omniASR_LLM_Unlimited_3B_v2 | omniASR_LLM_3B + unlimited audio length | 4_376_679_040 | 17.0 GiB | ~10 GiB | 0.095 (~1x) (0.208)³ |
omniASR_LLM_Unlimited_7B_v2 | omniASR_LLM_7B + unlimited audio length | 7_801_041_536 | 30.0 GiB | ~17 GiB | 0.097 (~1x) (0.208)³ |
omniASR_LLM_7B_ZS | Zero-Shot ASR | 7_810_900_608 | 30.0 GiB | ~20 GiB | 0.194 (~0.5x) |
omniASR_tokenizer_v1 | Tokenizer for all non-v2 models except omniASR_LLM_7B | - | 100 KiB | - | |
omniASR_tokenizer_v1_variant7 | Tokenizer for the omniASR_LLM_7B architecture | - | 100 KiB | - | |
omniASR_tokenizer_written_v2 | Tokenizer for all v2 architectures | - | 100 KiB | - |
¹ (batch=1, audio_len=30s, BF16, A100)
² Relative speed to omniASR_LLM_7B
³ (batch=1, audio_len=15min, BF16, A100)
~/.cache/fairseq2/assets/We provide a high-level model architecture overview in the model directory (src/omnilingual_asr/models), with individual configurations for each model family in the respective directories:
src/omnilingual_asr/models/wav2vec2_sslsrc/omnilingual_asr/models/wav2vec2_asrsrc/omnilingual_asr/models/wav2vec2_llamaTo further finetune the released checkpoints on your own data, use our data preparation guide followed by the finetuning recipe guide.
Omnilingual ASR code and models are released under the Apache 2.0.
If you use the omnilingual ASR model suite in your research and wish to cite us, please use the following BibTeX entry!
@misc{omnilingualasrteam2025omnilingualasropensourcemultilingual,
title={Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages},
author={Omnilingual ASR team and Gil Keren and Artyom Kozhevnikov and Yen Meng and Christophe Ropers and Matthew Setzler and Skyler Wang and Ife Adebara and Michael Auli and Can Balioglu and Kevin Chan and Chierh Cheng and Joe Chuang and Caley Droof and Mark Duppenthaler and Paul-Ambroise Duquenne and Alexander Erben and Cynthia Gao and Gabriel Mejia Gonzalez and Kehan Lyu and Sagar Miglani and Vineel Pratap and Kaushik Ram Sadagopan and Safiyyah Saleem and Arina Turkatenko and Albert Ventayol-Boada and Zheng-Xin Yong and Yu-An Chung and Jean Maillard and Rashel Moritz and Alexandre Mourachko and Mary Williamson and Shireen Yates},
year={2025},
eprint={2511.09690},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2511.09690},
}
FAQs
Omnilingual ASR Modeling Library
We found that omnilingual-asr demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Season’s greetings from Socket, and here’s to a calm end of year: clean dependencies, boring pipelines, no surprises.

Research
/Security News
Impostor NuGet package Tracer.Fody.NLog typosquats Tracer.Fody and its author, using homoglyph tricks, and exfiltrates Stratis wallet JSON/passwords to a Russian IP address.

Security News
Deno 2.6 introduces deno audit with a new --socket flag that plugs directly into Socket to bring supply chain security checks into the Deno CLI.