HomeAI Tools


Meta AI
4 liked
About SeamlessM4T

- SeamlessM4T is the first all-in-one multilingual multimodal AI translation and transcription model.

  • This single model can perform speech-to-text, speech-to-speech, text-to-speech, and text-to-text translations for up to 100 languages depending on the task.

The world we live in has never been more interconnected, giving people access to more multilingual content than ever before. This also makes the ability to communicate and understand information in any language increasingly important.

Today, we’re introducing SeamlessM4T, the first all-in-one multimodal and multilingual AI translation model that allows people to communicate effortlessly through speech and text across different languages. SeamlessM4T supports:

  • Speech recognition for nearly 100 languages
  • Speech-to-text translation for nearly 100 input and output languages
  • Speech-to-speech translation, supporting nearly 100 input languages and 36 (including English) output languages
  • Text-to-text translation for nearly 100 languages
  • Text-to-speech translation, supporting nearly 100 input languages and 35 (including English) output languages

In keeping with our approach to open science, we’re publicly releasing SeamlessM4T under a research license to allow researchers and developers to build on this work. We’re also releasing the metadata of SeamlessAlign, the biggest open multimodal translation dataset to date, totaling 270,000 hours of mined speech and text alignments.

Building a universal language translator, like the fictional Babel Fish in The Hitchhiker’s Guide to the Galaxy, is challenging because existing speech-to-speech and speech-to-text systems only cover a small fraction of the world’s languages. But we believe the work we’re announcing today is a significant step forward in this journey. Compared to approaches using separate models, SeamlessM4T’s single system approach reduces errors and delays, increasing the efficiency and quality of the translation process. This enables people who speak different languages to communicate with each other more effectively.

Visit Official Website


Show more
It is recommended not to use SeamlessM4T for long-form translations, or translations in professional fields, as the accuracy is not very high.
A big breakthrough of SeamlessM4T is that it does not rely on intermediate models to produce results, which makes it more flexible and even supports changing languages on the fly.
Community Posts
doom guyplan-icon
Reaching a new state-of-the-art, achieving a 20% improvement in BLEU scores over previous models on speech-to-text translation into English.
master chief117
Speech is a richer medium than text, conveying more information through intonation, expression, and interaction, making speech translation challenging, but also more natural and social.