Back

Mistral AI Launches Open-Source Voice Model Voxtral TTS

Introduction

Mistral AI, a French artificial‑intelligence company, announced the release of Voxtral TTS, an open‑source text‑to‑speech model. The model is built to run on a range of edge devices, from smartwatches to laptops, offering a cost‑effective solution for enterprises seeking voice‑enabled applications.

Multilingual Capabilities

Voxtral TTS supports nine languages, including English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi, and Arabic. The model can switch between languages without losing the distinctive characteristics of a custom voice, making it suitable for dubbing and real‑time translation scenarios.

Customization and Voice Fidelity

The system can adapt a custom voice using a sample of less than five seconds. It captures subtle accents, inflections, intonations, and irregularities, aiming for a human‑like sound rather than a robotic tone.

Performance Metrics

Designed for real‑time use, Voxtral TTS achieves a time‑to‑first‑audio (TTFA) of 90 ms for a 10‑second, 500‑character input. Its real‑time factor (RTF) of 6× means a 10‑second clip is rendered in roughly 1.6 seconds.

Strategic Positioning

By offering an open‑source, customizable model, Mistral seeks to attract enterprises that want to fine‑tune voice technology to their specific needs. The company highlights the model’s low cost compared with competing solutions and its suitability for integration into a broader multimodal platform that processes audio, text, and images.

Future Outlook

Mistral previously released transcription models for batch and low‑latency real‑time processing. With Voxtral TTS, the firm aims to provide a complete suite of voice products, positioning itself against competitors such as ElevenLabs, Deepgram, and OpenAI while emphasizing an end‑to‑end platform for multimodal AI applications.

Used: News Factory APP - news discovery and automation - ChatGPT for Business

Source: TechCrunch

Also available in: