What are the three new voice models announced by OpenAI?

The three new voice models announced by OpenAI are GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper.

What languages does GPT-Realtime-Translate support for live translation?

GPT-Realtime-Translate supports speech from over 70 input languages into 13 output languages.

OpenAI Unveils Three New Real-Time Voice Models for API

Q: What is the unique feature of GPT-Realtime-2?

GPT-Realtime-2 offers more controllable tone and delivery, allowing agents to respond with emotional nuance.

Q: In what industries do these new voice models have significant implications?

These new voice models have significant implications for various industries, including customer support, sales, education, media, events, and creator platforms.

Announcing GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper, designed to enhance voice intelligence and revolutionize multilingual communication.

OpenAI has announced the release of three new real-time voice models for its API: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper. These models are designed to enhance voice intelligence and enable more sophisticated voice-driven experiences in various applications.

What Happened

The new voice models were announced on May 7, 2026, as part of OpenAI's ongoing efforts to advance the capabilities of its API. The release includes three distinct models, each with unique features and capabilities. GPT-Realtime-2 is a voice model with GPT-5-class reasoning capabilities, designed to handle complex requests and maintain conversational flow. It also offers more controllable tone and delivery, allowing agents to respond with emotional nuance.

GPT-Realtime-Translate is a live translation model that supports speech from over 70 input languages into 13 output languages. This feature is aimed at revolutionizing multilingual communication and enabling real-time translation services for global customer support, sales, and educational platforms. GPT-Realtime-Whisper is a new streaming speech-to-text model designed for ultra-low latency transcription, ensuring that live captions, meeting notes, and other speech-to-text applications feel instantaneous and natural.

Background and Context

The development of these new voice models is part of OpenAI's ongoing efforts to advance the capabilities of its API. The company has been working on improving the accuracy and efficiency of its voice models, with a focus on enabling more sophisticated voice-driven experiences in various applications. The release of GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper marks a significant step forward in this effort.

The new voice models are designed to enable more advanced voice interfaces that can actively listen, reason, translate, transcribe, and take action while the conversation is still unfolding. This is a departure from simple call-and-response systems, which have been the norm in voice AI applications until now. The company highlights three emerging patterns in voice AI: voice-to-action, systems-to-voice, and voice-to-voice.

Why It Matters to the Industry

The release of these new voice models has significant implications for various industries, including customer support, sales, education, media, events, and creator platforms. The ability to provide real-time translation services in over 70 languages will enable companies to expand their global reach and improve communication with customers from diverse linguistic backgrounds.

The ultra-low latency transcription capabilities of GPT-Realtime-Whisper will also have a significant impact on industries that rely heavily on speech-to-text applications, such as live captioning and meeting notes. The controllable tone and delivery features of GPT-Realtime-2 will enable more natural and empathetic interactions between humans and machines.

OpenAI Unveils Three New Real-Time Voice Models for API

What Happened

Background and Context

Why It Matters to the Industry

What Comes Next
82,600 page views

Originally surfaced from this brief. Approximately 398 words.

Mentioned: OpenAI

Related stories

Voice Consent Gate: Mitigating Risks in Voice Cloning for Adult Industry

DeepL Acquires Mixhalo: Enhancing Real-time Multilingual Interpretation Capabilities

Jean-Baptiste Kempf Launches Kyber for Real-Time Control of Robots and Drones

Three Linux Kernel Vulnerabilities Allow Local Users Root Access

Brooke Hopkins' Coval Raises $28M for Voice AI Testing Platform

Cisco Study: AI Traffic on Campus Networks to Grow 209% in Three Years

Recently published

Linux Kernel Security Flaw: Potential Data Breach Risk for Adult-Industry Platforms

Malaysia Seizes $13M AI Chips in Smuggling Attempt

Hugging Face and VirusTotal Collaborate for Enhanced AI Security

DOJ Intervenes in Lawsuit Over xAI's Unpermitted Gas Turbines for National Security Reasons

Meta and Hugging Face Launch OpenEnv Hub for Scalable Agentic Development

OpenAI's Codex Introduces Automations for Scheduling and Automating Recurring Tasks

OpenAI Unveils Three New Real-Time Voice Models for API

What Happened

Background and Context

Why It Matters to the Industry

What Comes Next 82,600 page views Originally surfaced from this brief. Approximately 398 words. Mentioned: OpenAI

Related stories

Voice Consent Gate: Mitigating Risks in Voice Cloning for Adult Industry

DeepL Acquires Mixhalo: Enhancing Real-time Multilingual Interpretation Capabilities

Jean-Baptiste Kempf Launches Kyber for Real-Time Control of Robots and Drones

Three Linux Kernel Vulnerabilities Allow Local Users Root Access

Brooke Hopkins' Coval Raises $28M for Voice AI Testing Platform

Cisco Study: AI Traffic on Campus Networks to Grow 209% in Three Years

Recently published

Linux Kernel Security Flaw: Potential Data Breach Risk for Adult-Industry Platforms

Malaysia Seizes $13M AI Chips in Smuggling Attempt

Hugging Face and VirusTotal Collaborate for Enhanced AI Security

DOJ Intervenes in Lawsuit Over xAI's Unpermitted Gas Turbines for National Security Reasons

Meta and Hugging Face Launch OpenEnv Hub for Scalable Agentic Development

OpenAI's Codex Introduces Automations for Scheduling and Automating Recurring Tasks

What Comes Next
82,600 page views

Originally surfaced from this brief. Approximately 398 words.

Mentioned: OpenAI