OpenAI has announced the release of three new real-time voice models for its API: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper. These models are designed to enhance voice intelligence and enable more sophisticated voice-driven experiences in various applications.

What Happened

The new voice models were announced on May 7, 2026, as part of OpenAI's ongoing efforts to advance the capabilities of its API. The release includes three distinct models, each with unique features and capabilities. GPT-Realtime-2 is a voice model with GPT-5-class reasoning capabilities, designed to handle complex requests and maintain conversational flow. It also offers more controllable tone and delivery, allowing agents to respond with emotional nuance.

GPT-Realtime-Translate is a live translation model that supports speech from over 70 input languages into 13 output languages. This feature is aimed at revolutionizing multilingual communication and enabling real-time translation services for global customer support, sales, and educational platforms. GPT-Realtime-Whisper is a new streaming speech-to-text model designed for ultra-low latency transcription, ensuring that live captions, meeting notes, and other speech-to-text applications feel instantaneous and natural.

Background and Context

The development of these new voice models is part of OpenAI's ongoing efforts to advance the capabilities of its API. The company has been working on improving the accuracy and efficiency of its voice models, with a focus on enabling more sophisticated voice-driven experiences in various applications. The release of GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper marks a significant step forward in this effort.

The new voice models are designed to enable more advanced voice interfaces that can actively listen, reason, translate, transcribe, and take action while the conversation is still unfolding. This is a departure from simple call-and-response systems, which have been the norm in voice AI applications until now. The company highlights three emerging patterns in voice AI: voice-to-action, systems-to-voice, and voice-to-voice.

Why It Matters to the Industry

The release of these new voice models has significant implications for various industries, including customer support, sales, education, media, events, and creator platforms. The ability to provide real-time translation services in over 70 languages will enable companies to expand their global reach and improve communication with customers from diverse linguistic backgrounds.

The ultra-low latency transcription capabilities of GPT-Realtime-Whisper will also have a significant impact on industries that rely heavily on speech-to-text applications, such as live captioning and meeting notes. The controllable tone and delivery features of GPT-Realtime-2 will enable more natural and empathetic interactions between humans and machines.

What Comes Next

18,166 page views

Originally surfaced from this brief. Approximately 398 words.
Mentioned: OpenAI