InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Unlock the full InfoQ experience

Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with content, and download exclusive resources.

Don't have an InfoQ account?

Stay updated on topics and peers that matter to youReceive instant alerts on the latest insights and trends.
Quickly access free resources for continuous learningMinibooks, videos with transcripts, and training materials.
Save articles and read at anytimeBookmark articles to read whenever youre ready.

Logo - Back to homepage

News Articles Presentations Podcasts Guides

Topics

Development

Featured in Development

Go Channels: Understanding Happens-Before for Safe Concurrency

This article dives into the happens-before semantics of Go channels, explaining how they relate to memory visibility, synchronization, and concurrency correctness. We'll examine subtle pitfalls, illustrate them with examples, and explore the architectural implications for system designers.

Go Channels: Understanding Happens-Before for Safe Concurrency

All in development

Architecture & Design

Featured in Architecture & Design

If Architectures Could Talk, They’d Quote Your Boss

Software architecture reflects how organizations communicate and make decisions. Failures stem from misaligned incentives, unclear ownership, and structural gaps—not technical flaws. Architects must design not just systems, but the conditions for systems to thrive, using platform thinking to reduce friction and foster autonomy.

If Architectures Could Talk, They’d Quote Your Boss

All in architecture-design

AI Infrastructure

Featured in AI, ML & Data Engineering

Deploy MultiModal RAG Systems with vLLM

Stephen Batifol discusses building and optimizing self-hosted, multimodal RAG systems. He breaks down vector search, nearest neighbor indexes (FLAT, IVF, HNSW), and the critical role of choosing the right embedding model. He then explains vLLM inference optimization (paged attention, quantization) and uses Mistral's Pixtral to detail multimodal large language model architecture.

Deploy MultiModal RAG Systems with vLLM

All in ai-ml-data-eng

Culture & Methods

Featured in Culture & Methods

Creating Impactful Teams Across Diverse Work Environments

Natan Žabkar Nordberg shares actionable strategies for creating impactful teams across diverse work environments, focusing on the link between culture, diversity, and ROI. He discusses how to build trust through early delegation, empower teams with guided autonomy (using improv examples), and improve communication via a "session 0" framework, offering key takeaways for all engineering leaders.

Creating Impactful Teams Across Diverse Work Environments

All in culture-methods

DevOps

Featured in DevOps

From Grassroots to Enterprise: Vanguard's Journey in SRE Transformation

Christina Yakomin shares Vanguard's SRE transformation: from quarterly testing of monoliths to a mature DevOps model with continuous delivery. She explains the SRE coaching hub, self-service tools, and advanced techniques like request-rate autoscaling. She details modern challenges, including region failure game days and testing AI-backed contact centers.

From Grassroots to Enterprise: Vanguard's Journey in SRE Transformation

All in devops

Events

Helpful links

Choose your language

QCon San Francisco 2025

Get production-proven patterns from the leaders who scaled a GenAI search platform to millions, migrated a core ML system without downtime, and architected a global streaming service from the ground up.

Early Bird ends Nov 11.

QCon AI New York 2025

Move beyond AI demos to real engineering impact. Discover how teams embed LLMs, govern models, and scale inference pipelines to accelerate development securely.

Early Bird ends Nov 11.

QCon London 2025

Benchmark your systems against leading engineering teams. See what really works in FinOps, modern Java, and distributed data architectures to balance cost, scale, and reliability.

Early Bird ends Nov 11.

InfoQ Homepage News OpenAI’s gpt-realtime Enables Production-Ready Voice Agents with End-to-End Speech Processing

AI, ML & Data Engineering

OpenAI’s gpt-realtime Enables Production-Ready Voice Agents with End-to-End Speech Processing

Sep 11, 2025 2 min read

Hien Luu

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

Listen to this article - 0:00

Audio ready to play

0:00

Reading list

OpenAI has released gpt-realtime, its most advanced speech-to-speech model, alongside the general availability of the Realtime API. The updates aim to reduce latency, improve speech quality, and give developers stronger tools, such as MCP server support, image input, and Session Initiation Protocol (SIP) phone calling support, for building production-ready AI voice agents.

The combined Realtime API and gpt-realtime is designed to handle end-to-end speech processing within a single system, rather than chaining together separate speech-to-text and text-to-speech models. This architecture cuts response times while preserving nuance in delivery, a critical improvement for real-time agents where even small delays can break conversational flow.

The gpt-realtime was trained to produce higher-quality speech with more natural pacing and intonation, and to respond reliably to style instructions such as “speak empathetically” or “use a professional tone.” Two new synthetic voices, Cedar and Marin, are available, and existing voices have been updated for greater realism.

On comprehension benchmarks, gpt-realtime shows measurable improvements. It can track non-verbal cues, switch languages within a single sentence, and more accurately process alphanumeric sequences (such as phone numbers, VINs, etc) across languages, including Spanish, Chinese, Japanese, and French. Internal testing highlights this jump, with gpt-realtime reaching 82.8% accuracy on Big Bench Audio compared to 65.6% for the previous model. Instruction-following is also sharper, with MultiChallenge audio benchmark scores rising from 20.6% to 30.5%.

Function calling is another area of focus. The model now performs better at identifying relevant functions, calling them at the right time, and supplying the correct arguments. On ComplexFuncBench, accuracy rose to 66.5% from 49.7%. There were updates to asynchronous function calling, allowing the voice agent to continue the conversation while waiting for results, a feature with obvious value for customer support and transactional applications.

The Realtime API has been upgraded to align with production requirements. Developers can now connect remote MCP servers directly into a session, enabling tool calls without manual integration work. Image input is supported, allowing applications to ground conversations in visual context, such as screenshots or photos. SIP support makes it possible to integrate voice agents with existing telephony systems, including PBXs and desk phones. Reusable prompts simplify session management, while full EU data residency support addresses compliance concerns for European deployments.

According to the release notes, early enterprise partners are testing these capabilities in production-like scenarios. Zillow is piloting voice-driven home search, while T-Mobile is exploring customer service use cases where real-time adaptability is essential. Both companies highlight the shift from scripted automation to more flexible, domain-specific expertise delivered through AI agents.

OpenAI has also reinforced safeguards around deployment. The Realtime API incorporates classifiers that can terminate harmful conversations, and developers can add domain-specific guardrails via the Agents SDK. Preset voices in Realtime API are used to reduce impersonation risks.

Both gpt-realtime model and Realtime API are immediately available to all developers. To get started, developers can visit the Realtime API documentation and prompting guide, and test the new gpt-realtime demo in the Playground.

About the Author

Hien Luu

Show moreShow less

This content is in the AI, ML & Data Engineering topic

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

OpenAI’s gpt-realtime Enables Production-Ready Voice Agents with End-to-End Speech Processing

Write for InfoQ

About the Author

Hien Luu

Rate this Article

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

Related Content

The InfoQ Newsletter