Meta Launches Omnilingual ASR: A Groundbreaking Open-Source Speech Recognition System Supporting 1,600+ Languages

Meta has made a significant leap in the field of artificial intelligence with the release of its Omnilingual Automatic Speech Recognition (ASR) system, which supports over 1,600 languages natively. This ambitious project not only dwarfs existing models like OpenAI’s Whisper, which supports just 99 languages, but also introduces a groundbreaking feature known as zero-shot in-context learning. This allows developers to extend the model’s capabilities to thousands more languages without the need for retraining, effectively broadening its potential coverage to over 5,400 languages—essentially encompassing every spoken language with a known script.

The Omnilingual ASR system represents a paradigm shift from static model capabilities to a flexible framework that communities can adapt and expand. While the initial training covers 1,600 languages, the broader figure reflects the system’s ability to generalize on demand, making it the most extensible speech recognition system released to date. This flexibility is particularly crucial in a world where linguistic diversity is often overlooked in technological advancements.

One of the standout features of the Omnilingual ASR is its open-source nature, released under a plain Apache 2.0 license. This contrasts sharply with Meta’s previous releases, such as the Llama model, which imposed restrictions on larger enterprises unless they paid licensing fees. By removing these barriers, Meta empowers researchers and developers to implement the technology freely, even in commercial and enterprise-grade projects. This move aligns with Meta’s broader mission to break down language barriers, expand digital access, and empower communities worldwide.

Released on November 10, 2025, the Omnilingual ASR suite includes a family of speech recognition models, a 7-billion parameter multilingual audio representation model, and a massive speech corpus that spans over 350 previously underserved languages. All resources are available under open licenses, and the models support speech-to-text transcription out of the box. Meta’s commitment to open sourcing these models and datasets aims to democratize access to advanced speech recognition technology, particularly for low-resource languages that have historically been neglected.

At its core, the Omnilingual ASR is designed for speech-to-text transcription. The models are trained to convert spoken language into written text, supporting a wide range of applications including voice assistants, transcription tools, subtitles, oral archive digitization, and accessibility features for low-resource languages. Unlike earlier ASR models that required extensive labeled training data, the Omnilingual ASR includes a zero-shot variant capable of transcribing languages it has never encountered before. This innovative approach dramatically lowers the barrier for adding new or endangered languages, eliminating the need for large corpora or retraining.

The technical design of the Omnilingual ASR suite is robust, comprising multiple model families trained on more than 4.3 million hours of audio from over 1,600 languages. These include wav2vec 2.0 models for self-supervised speech representation learning, CTC-based ASR models for efficient supervised transcription, and LLM-ASR models that combine a speech encoder with a Transformer-based text decoder for state-of-the-art transcription. The inclusion of the LLM-ZeroShot ASR model enables inference-time adaptation to unseen languages, further enhancing the system’s versatility.

The scale of this release is particularly noteworthy. While Whisper and similar models have made strides in advancing ASR capabilities for widely spoken languages, they fall short when it comes to the long tail of human linguistic diversity. Meta’s system directly supports over 1,600 languages and can generalize to more than 5,400 languages using in-context learning. It achieves character error rates (CER) below 10% in 78% of supported languages, with more than 500 of those languages never previously covered by any ASR model, according to Meta’s research paper. This expansion opens up new possibilities for communities whose languages are often excluded from digital tools, allowing for greater inclusivity and representation in the digital landscape.

The release of Omnilingual ASR comes at a critical juncture for Meta’s AI strategy, following a year marked by organizational turbulence, leadership changes, and mixed reviews of its previous model, Llama 4. The challenges faced by Meta in the AI space prompted CEO Mark Zuckerberg to appoint Alexandr Wang, co-founder and former CEO of AI data supplier Scale AI, as Chief AI Officer. This strategic move was part of a broader effort to revitalize Meta’s AI division and restore its reputation in the field of multilingual AI.

Omnilingual ASR signifies a return to a domain where Meta has historically excelled—multilingual AI—and offers a truly extensible, community-oriented stack with minimal barriers to entry. The system’s support for over 1,600 languages and its capacity to extend to thousands more via zero-shot in-context learning reaffirm Meta’s engineering credibility in language technology. Importantly, this release is accompanied by transparent dataset sourcing and reproducible training protocols, aligning with Meta’s commitment to ethical AI development.

To achieve the impressive scale of the Omnilingual ASR, Meta collaborated with researchers and community organizations across Africa, Asia, and other regions to create the Omnilingual ASR Corpus, a 3,350-hour dataset encompassing 348 low-resource languages. Contributors were compensated local speakers, and recordings were gathered in partnership with groups such as African Next Voices, Mozilla Foundation’s Common Voice, and Lanfrica/NaijaVoices. The data collection focused on natural, unscripted speech, with culturally relevant and open-ended prompts designed to elicit authentic responses. Quality assurance was built into every step of the transcription process, ensuring high standards for the resulting datasets.

Performance benchmarks for the Omnilingual ASR models show strong results even in low-resource scenarios. The largest model in the suite, the omniASR_LLM_7B, requires approximately 17GB of GPU memory for inference, making it suitable for deployment on high-end hardware. Smaller models, ranging from 300 million to 1 billion parameters, can run on lower-power devices while still delivering real-time transcription speeds. The system demonstrates robustness in noisy conditions and unseen domains, particularly when fine-tuned for specific applications.

The zero-shot system, omniASR_LLM_7B_ZS, allows users to transcribe new languages with minimal setup. By providing just a few sample audio-text pairs, the model can generate transcriptions for new utterances in the same language, making it an invaluable tool for developers looking to expand their language offerings quickly and efficiently.

All models and datasets associated with the Omnilingual ASR are licensed under permissive terms, including the Apache 2.0 license for models and code, and CC-BY 4.0 for the Omnilingual ASR Corpus available on Hugging Face. Installation is straightforward, supported via PyPI, and Meta provides additional resources such as a Hugging Face dataset integration, pre-built inference pipelines, and language-code conditioning for improved accuracy.

For enterprise developers, especially those operating in multilingual or international markets, the Omnilingual ASR significantly lowers the barrier to deploying speech-to-text systems across a broader range of customers and geographies. Instead of relying on commercial ASR APIs that typically support only a narrow set of high-resource languages, teams can now integrate an open-source pipeline that covers over 1,600 languages out of the box, with the option to extend it to thousands more through zero-shot learning.

This flexibility is particularly valuable for enterprises working in sectors such as voice-based customer support, transcription services, accessibility, education, and civic technology, where local language coverage can be both a competitive advantage and a regulatory necessity. The models’ permissive Apache 2.0 license allows businesses to fine-tune, deploy, or integrate them into proprietary systems without restrictive terms, marking a significant shift in the ASR landscape from centralized, cloud-gated offerings to community-extendable infrastructure.

By making multilingual speech recognition more accessible, customizable, and cost-effective, Omnilingual ASR opens the door to a new generation of enterprise speech applications built around linguistic inclusion rather than limitation. This release not only enhances Meta’s standing in the AI community but also sets a new standard for what is possible in the realm of speech recognition technology.

In conclusion, Meta’s Omnilingual ASR is more than just a technological advancement; it represents a commitment to inclusivity, accessibility, and community empowerment in the digital age. By prioritizing open-source principles and community-driven development, Meta is paving the way for a future where language barriers are diminished, and everyone has the opportunity to participate in the digital conversation. As the world becomes increasingly interconnected, the importance of tools that facilitate communication across languages cannot be overstated, and the Omnilingual ASR stands at the forefront of this movement.