OpenAI adds real‑time voice, translation and transcription to its API
OpenAI unveiled a trio of voice‑intelligence models for its API on Thursday, signaling a shift from simple call‑and‑response systems to more versatile audio interfaces. The flagship, GPT‑Realtime‑2, builds on the earlier GPT‑Realtime‑1.5 but runs on GPT‑5‑class reasoning, allowing it to handle complex user requests while maintaining a natural conversational tone.
Alongside the new conversational model, OpenAI introduced GPT‑Realtime‑Translate, a real‑time translation engine that supports over 70 source languages and can output speech in 13 target languages. The company describes the service as keeping pace with a speaker, delivering fluent, context‑aware translations as a dialogue unfolds.
The third addition, GPT‑Realtime‑Whisper, provides live speech‑to‑text conversion. Users can capture spoken words as they occur, turning audio streams into accurate transcripts without a separate post‑processing step.
All three models are accessible through OpenAI’s Realtime API. Pricing differs by function: translation and transcription are billed by the minute, while the conversational model follows token‑based consumption. This structure gives developers flexibility in managing costs based on usage patterns.
OpenAI highlighted several sectors that could benefit from the new capabilities. Customer‑service platforms can deploy voice agents that listen, reason and act within a single interaction. Educational tools may use real‑time translation to bridge language barriers, while media outlets and event organizers can automate captioning and multilingual coverage. Creator platforms stand to gain from seamless voice integration that enhances user engagement.
Recognizing the potential for abuse, OpenAI embedded safeguards into the models. Specific triggers pause conversations that violate the company’s harmful‑content policy, aiming to prevent spam, fraud and other malicious activities. The firm emphasized that these guardrails are part of its broader effort to ensure responsible deployment of powerful AI tools.
Industry observers note that the announcement expands OpenAI’s competitive edge in the rapidly growing voice‑AI market. By offering a single API that handles conversation, translation and transcription, the company reduces the need for developers to stitch together multiple services. The move could accelerate adoption of voice interfaces across a range of applications, from virtual assistants to real‑time multilingual support.
Used: News Factory APP - news discovery and automation - ChatGPT for Business