Back Apr 2, 2026

Microsoft Unveils New Voice, Transcription and Image AI Models

Microsoft Expands AI Portfolio with New Voice, Transcription, and Image Models

Microsoft has introduced three new artificial‑intelligence models that mark a significant expansion beyond its traditional focus on large language models. The first two models target audio capabilities: a voice model capable of creating audio recordings up to 60 seconds in length, and a transcription model that can translate spoken recordings into text across 25 different languages. Both models are designed for practical applications such as video captioning, meeting transcription, and powering voice‑based agents.

The third offering is the second generation of Microsoft’s in‑house image model. Compared with its predecessor, the new image model generates visuals more quickly and produces depictions that are notably more lifelike. Microsoft has made these models immediately accessible through its Foundry platform and the MAI playground, and it has outlined future plans to embed the image model—referred to as MAI‑Image‑2—into widely used products like Bing and PowerPoint.

These releases signal Microsoft’s broader strategy to diversify its AI services and provide enterprise‑friendly tools that complement its popular Copilot suite. Copilot, which integrates tightly with the Office 365 suite and Azure cloud services, has become a staple for businesses seeking AI‑enhanced productivity. In addition to the newly announced models, Microsoft has recently rolled out Copilot Cowork and Copilot Health, further demonstrating its commitment to delivering secure, enterprise‑grade AI solutions.

Microsoft’s deep financial resources and extensive compute infrastructure enable the company to pursue “side quests” in generative media—efforts that even well‑funded startups sometimes cannot sustain. The company’s ability to invest heavily in new AI capabilities stands in contrast to recent moves by competitors. For example, OpenAI announced the discontinuation of its Sora video‑generation app to refocus on core activities, highlighting the challenges smaller players face when scaling generative media workloads.

The broader AI industry in 2026 continues to emphasize workplace relevance, with firms like Anthropic making strides through models such as Claude Code. At the same time, the sector grapples with the high compute and energy demands of generative media. Google, another legacy tech giant, has reaffirmed its commitment to generative media while pledging to improve cost‑ and energy‑efficiency with new offerings like the Veo 3.1 Lite video model.

Overall, Microsoft’s latest AI models underscore a strategic push to broaden its AI ecosystem, deliver tangible productivity tools, and leverage its scale to stay ahead in a competitive landscape that balances innovation with the practical demands of enterprise customers.

Used: News Factory APP - news discovery and automation - ChatGPT for Business

Source: CNET

Also available in:

Português Microsoft Apresenta Novos Modelos de Voz, Transcrição e Imagem de IA Español Microsoft Presenta Nuevos Modelos de Voz, Transcripción y Imagen de Inteligencia Artificial