TechCrunch Microsoft AI, the research arm of the tech giant, announced the rollout of three foundational multimodal models—MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2. The transcription model supports 25 languages and is 2.5 times faster than Azure Fast. The voice model can generate a minute of audio in one second and allows custom voice creation. The image model, originally unveiled on MAI Playground, expands Microsoft’s AI portfolio and is priced to be cheaper than competing offerings from Google and OpenAI. The launch underscores Microsoft’s commitment to building its own AI stack while maintaining its partnership with OpenAI.
Read more →