Xiaomi Launches MiMo V2.5 Voice AI With Advanced TTS and ASR

Xiaomi has rolled out a major update to its MiMo voice AI platform, unveiling the MiMo-V2.5 series with new models for text-to-speech and speech recognition aimed at real-world, agent-driven AI applications.

The update expands on Xiaomi’s earlier MiMo-V2 release and positions the company more aggressively in the fast-growing voice AI space, where natural interaction, multilingual support, and low-latency performance are becoming critical.

New text-to-speech models

The MiMo-V2.5-TTS lineup includes three models, currently available for limited free access via Xiaomi’s MiMo Open Platform.

The base model offers preset voices with adjustable speed, pitch, and emotional tone. A second variant, VoiceDesign, allows users to generate entirely new voice timbres from a short text input.

The most advanced version, VoiceClone, can recreate a specific voice using only a small number of samples while maintaining consistency across different speaking styles.

Xiaomi said the system responds to natural language instructions rather than fixed parameters, allowing users to describe vocal delivery in plain terms.

It also supports script-style inputs for games, animation, and audio dramas, with fine control over characters, scenes, and dialogue. Inline audio tags enable emotion or emphasis changes within individual sentences, in both Chinese and English.

Open-source speech recognition

Xiaomi also released MiMo-V2.5-ASR, an open-source speech recognition model built for noisy, multilingual, and multi-speaker environments.

The model supports seamless switching between Chinese and English and includes recognition for major Chinese dialects such as Cantonese, Wu, Minnan, and Sichuanese. It is designed to handle overlapping speech, far-field audio, and even song lyrics mixed with background music.

Built-in phonetics and context-aware punctuation reduce the need for manual cleanup. Xiaomi said the model delivers state-of-the-art or near state-of-the-art performance across benchmarks for bilingual transcription and code-switching tasks.

Availability

The MiMo-V2.5 TTS models can be tested through MiMo Studio, while MiMo-V2.5-ASR is available with open-source code and weights for direct deployment or customization.

Xiaomi Launches MiMo V2.5 Voice AI With Advanced TTS and ASR

New text-to-speech models

Open-source speech recognition

Availability

Meesam Abbas

Leave a Reply Cancel reply