AI Tools Verified · 1 source · primary source

OpenAI launches GPT-Realtime-2, Translate, and Whisper for live voice apps

OpenAI says three Realtime API audio models—GPT‑Realtime‑2, GPT‑Realtime‑Translate, and GPT‑Realtime‑Whisper—support voice agents that reason, translate, and transcribe in real time.

Posted
May 8, 2026 · 8:30 AM
Original source
May 7, 2026 · Source age: 1 day
Read time
2 min
Sources
1
Verified briefing

Passed source freshness, duplicate, QA, and review checks before publishing. Main source freshness limit: 14 days.

Source count
1
Primary sources
1
QA status
pass

Plain English

What this means in simple words

Instead of recording audio and sending it later, an app can talk to the API continuously and get immediate speech-to-text, translation, and spoken replies.

What happened

On May 7, 2026, OpenAI introduced three Realtime API audio models: GPT‑Realtime‑2 for voice interactions with stronger reasoning, GPT‑Realtime‑Translate for live speech translation, and GPT‑Realtime‑Whisper for streaming transcription.

Why it matters

Realtime voice apps often fail on long context, tool calls, or multilingual use. These models target lower-latency voice agents that can keep a conversation going while translating or transcribing, which can expand where voice interfaces are practical.

Key points

  • GPT‑Realtime‑2 targets live conversations with stronger reasoning and longer context for agent workflows.
  • GPT‑Realtime‑Translate supports live speech translation across 70+ input languages into 13 output languages.
  • GPT‑Realtime‑Whisper provides low-latency streaming transcription priced per minute.

What to watch

Watch developer adoption in the Realtime API, how translation quality holds up in noisy settings, and whether voice agents reliably handle interruptions and tool calls in production.

Key terms

Realtime API
An API pattern where audio is streamed continuously so models can respond while a conversation is happening.
Streaming transcription
Speech-to-text that outputs partial text as someone speaks, reducing perceived latency.

Sources

Source dates are original publication dates. The posted date above is when The AI Tea published this explanation.

Related posts