InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Unlock the full InfoQ experience

Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with content, and download exclusive resources.

Don't have an InfoQ account?

Stay updated on topics and peers that matter to youReceive instant alerts on the latest insights and trends.
Quickly access free resources for continuous learningMinibooks, videos with transcripts, and training materials.
Save articles and read at anytimeBookmark articles to read whenever youre ready.

Logo - Back to homepage

News Articles Presentations Podcasts Guides

Topics

Development

Featured in Development

Go Channels: Understanding Happens-Before for Safe Concurrency

This article dives into the happens-before semantics of Go channels, explaining how they relate to memory visibility, synchronization, and concurrency correctness. We'll examine subtle pitfalls, illustrate them with examples, and explore the architectural implications for system designers.

Go Channels: Understanding Happens-Before for Safe Concurrency

All in development

Architecture & Design

Featured in Architecture & Design

If Architectures Could Talk, They’d Quote Your Boss

Software architecture reflects how organizations communicate and make decisions. Failures stem from misaligned incentives, unclear ownership, and structural gaps—not technical flaws. Architects must design not just systems, but the conditions for systems to thrive, using platform thinking to reduce friction and foster autonomy.

If Architectures Could Talk, They’d Quote Your Boss

All in architecture-design

AI Infrastructure

Featured in AI, ML & Data Engineering

Deploy MultiModal RAG Systems with vLLM

Stephen Batifol discusses building and optimizing self-hosted, multimodal RAG systems. He breaks down vector search, nearest neighbor indexes (FLAT, IVF, HNSW), and the critical role of choosing the right embedding model. He then explains vLLM inference optimization (paged attention, quantization) and uses Mistral's Pixtral to detail multimodal large language model architecture.

Deploy MultiModal RAG Systems with vLLM

All in ai-ml-data-eng

Culture & Methods

Featured in Culture & Methods

Creating Impactful Teams Across Diverse Work Environments

Natan Žabkar Nordberg shares actionable strategies for creating impactful teams across diverse work environments, focusing on the link between culture, diversity, and ROI. He discusses how to build trust through early delegation, empower teams with guided autonomy (using improv examples), and improve communication via a "session 0" framework, offering key takeaways for all engineering leaders.

Creating Impactful Teams Across Diverse Work Environments

All in culture-methods

DevOps

Featured in DevOps

From Grassroots to Enterprise: Vanguard's Journey in SRE Transformation

Christina Yakomin shares Vanguard's SRE transformation: from quarterly testing of monoliths to a mature DevOps model with continuous delivery. She explains the SRE coaching hub, self-service tools, and advanced techniques like request-rate autoscaling. She details modern challenges, including region failure game days and testing AI-backed contact centers.

From Grassroots to Enterprise: Vanguard's Journey in SRE Transformation

All in devops

Events

Helpful links

Choose your language

QCon San Francisco 2025

Get production-proven patterns from the leaders who scaled a GenAI search platform to millions, migrated a core ML system without downtime, and architected a global streaming service from the ground up.

Early Bird ends Nov 11.

QCon AI New York 2025

Move beyond AI demos to real engineering impact. Discover how teams embed LLMs, govern models, and scale inference pipelines to accelerate development securely.

Early Bird ends Nov 11.

QCon London 2025

Benchmark your systems against leading engineering teams. See what really works in FinOps, modern Java, and distributed data architectures to balance cost, scale, and reliability.

Early Bird ends Nov 11.

InfoQ Homepage News OpenAI Launches Public Beta of Realtime API for Low-Latency Speech Interactions

AI, ML & Data Engineering

OpenAI Launches Public Beta of Realtime API for Low-Latency Speech Interactions

This item in japanese

Oct 14, 2024 2 min read

Robert Krzaczyński

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

Reading list

OpenAI launched the public beta of the Realtime API, offering developers the ability to create low-latency, multimodal voice interactions within their applications. Additionally, audio input/output is now available in the Chat Completions API, expanding options for voice-driven applications. Early feedback highlights limited voice options and response cutoffs, similar to ChatGPT’s Advanced Voice Mode.

The Realtime API enables real-time, natural speech-to-speech interactions using six preset voices, combining speech recognition and synthesis into a single API call. This simplifies the development of fluid conversational applications by streamlining what previously required multiple models.

OpenAI has also extended the capabilities of its Chat Completions API by adding support for audio input and output. This feature is geared towards use cases that do not require the low-latency performance of the Realtime API, allowing developers to send either text or audio inputs and receive responses in text, audio, or both.

In the past, creating voice assistant experiences required using multiple models for different tasks, such as automatic speech recognition, text inference, and text-to-speech. This often resulted in delays and lost nuance. The Realtime API addresses these issues by streamlining the entire process into a single API call, offering faster and more natural conversational capabilities.

The Realtime API is powered by a persistent WebSocket connection, allowing for continuous message exchange with GPT-4o. It also supports function calling, enabling voice assistants to perform tasks such as placing orders or retrieving relevant user data to provide more personalized responses.

Additionally, the community observed that although the API is accessible through the Playground, the available voice options are currently limited to alloy, echo, and shimmer. During testing, users noticed that the responses were subject to the same limitations as ChatGPT’s Advanced Voice Mode. Despite attempts to use detailed system messages, the responses were still cut off, hinting at the involvement of a separate model managing the flow of conversations.

The Realtime API is available in public beta for all paid developers. Audio in the Chat Completions API will be released in the coming weeks. Pricing for the Realtime API includes both text and audio tokens, with audio input priced at approximately 0ドル.06 per minute and audio output at 0ドル.24 per minute.

There have been concerns about how this pricing could impact long-duration interactions. Some developers pointed out that due to the way the model processes conversations, where every response involves revisiting prior exchanges, costs can accumulate quickly. This has led to thoughts about value, especially for extended conversations, given that large language models do not retain short-term memory and must reprocess prior content.

Developers can explore the Realtime API by checking the official documentation and the reference client.

About the Author

Robert Krzaczyński

Show moreShow less

This content is in the AI, ML & Data Engineering topic

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

OpenAI Launches Public Beta of Realtime API for Low-Latency Speech Interactions

Write for InfoQ

About the Author

Robert Krzaczyński

Rate this Article

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

Related Content

The InfoQ Newsletter