InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Unlock the full InfoQ experience

Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with content, and download exclusive resources.

Don't have an InfoQ account?

Stay updated on topics and peers that matter to youReceive instant alerts on the latest insights and trends.
Quickly access free resources for continuous learningMinibooks, videos with transcripts, and training materials.
Save articles and read at anytimeBookmark articles to read whenever youre ready.

Logo - Back to homepage

News Articles Presentations Podcasts Guides

Topics

Development

Featured in Development

Go Channels: Understanding Happens-Before for Safe Concurrency

This article dives into the happens-before semantics of Go channels, explaining how they relate to memory visibility, synchronization, and concurrency correctness. We'll examine subtle pitfalls, illustrate them with examples, and explore the architectural implications for system designers.

Go Channels: Understanding Happens-Before for Safe Concurrency

All in development

Architecture & Design

Featured in Architecture & Design

If Architectures Could Talk, They’d Quote Your Boss

Software architecture reflects how organizations communicate and make decisions. Failures stem from misaligned incentives, unclear ownership, and structural gaps—not technical flaws. Architects must design not just systems, but the conditions for systems to thrive, using platform thinking to reduce friction and foster autonomy.

If Architectures Could Talk, They’d Quote Your Boss

All in architecture-design

AI Infrastructure

Featured in AI, ML & Data Engineering

Deploy MultiModal RAG Systems with vLLM

Stephen Batifol discusses building and optimizing self-hosted, multimodal RAG systems. He breaks down vector search, nearest neighbor indexes (FLAT, IVF, HNSW), and the critical role of choosing the right embedding model. He then explains vLLM inference optimization (paged attention, quantization) and uses Mistral's Pixtral to detail multimodal large language model architecture.

Deploy MultiModal RAG Systems with vLLM

All in ai-ml-data-eng

Culture & Methods

Featured in Culture & Methods

Creating Impactful Teams Across Diverse Work Environments

Natan Žabkar Nordberg shares actionable strategies for creating impactful teams across diverse work environments, focusing on the link between culture, diversity, and ROI. He discusses how to build trust through early delegation, empower teams with guided autonomy (using improv examples), and improve communication via a "session 0" framework, offering key takeaways for all engineering leaders.

Creating Impactful Teams Across Diverse Work Environments

All in culture-methods

DevOps

Featured in DevOps

From Grassroots to Enterprise: Vanguard's Journey in SRE Transformation

Christina Yakomin shares Vanguard's SRE transformation: from quarterly testing of monoliths to a mature DevOps model with continuous delivery. She explains the SRE coaching hub, self-service tools, and advanced techniques like request-rate autoscaling. She details modern challenges, including region failure game days and testing AI-backed contact centers.

From Grassroots to Enterprise: Vanguard's Journey in SRE Transformation

All in devops

Events

Helpful links

Choose your language

QCon San Francisco 2025

Get production-proven patterns from the leaders who scaled a GenAI search platform to millions, migrated a core ML system without downtime, and architected a global streaming service from the ground up.

Early Bird ends Nov 11.

QCon AI New York 2025

Move beyond AI demos to real engineering impact. Discover how teams embed LLMs, govern models, and scale inference pipelines to accelerate development securely.

Early Bird ends Nov 11.

QCon London 2025

Benchmark your systems against leading engineering teams. See what really works in FinOps, modern Java, and distributed data architectures to balance cost, scale, and reliability.

Early Bird ends Nov 11.

InfoQ Homepage News OpenAI’s GPT-5 Debuts with Commoditizing Costs and Higher Scrutiny

AI, ML & Data Engineering

OpenAI’s GPT-5 Debuts with Commoditizing Costs and Higher Scrutiny

Aug 11, 2025 3 min read

Andrew Hoblitzell

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

Listen to this article - 0:00

Audio ready to play

0:00

Reading list

On August 7, 2025, the company rolled out GPT-5 to ChatGPT users and to the API, with a router that decides when to “think” longer, new model sizes, and pricing aimed at production use. The product page also advertises a 400K token context with 128K maximum output tokens for the new lineup.

OpenAI presented bar charts intended to illustrate improvements in GPT-5’s deception and benchmark performance, only to have them visually contradict the very numbers they displayed. One chart labeled “coding deception” showed GPT-5 (with thinking) at 50 percent but depicted it with a noticeably shorter bar than OpenAI’s o3 at 47.4 percent, even though the written figure for GPT-5 was later corrected to 16.5 percent in the blog post. In another slide, values of 69.1 percent and 30.8 percent were drawn as bars of equal height, while a 52.8 percent figure appeared taller, flipping the intended ranking and misleading the viewer.

The company’s API surface has consolidated around the Responses API. Introduced in March and expanded in May, it is the primary primitive for “agentic” apps, combining multimodal prompting with built-in tools. OpenAI’s update details direct access to image generation, Code Interpreter, improved file search, and remote Model Context Protocol (MCP) servers from a single request. It also adds background mode for long-running tasks, reasoning summaries, and encrypted reasoning items.

OpenAI asserted that for broader ML-engineering tasks such as MLE-Bench and Kaggle-like GPU workloads, the ChatGPT agent (the routed product system) scores highest with a 9 percent bronze pass rate on a curated subset. On SWE-Lancer, which is composed of end-to-end, E2E-tested full-stack tasks, the ChatGPT agent is likewise the best performer. These results suggest GPT-5’s reasoning model is strongest on code-centric debugging and replication, while the routed agent does better on long-horizon, multi-skill workloads.

By opening GPT-5 to everyone immediately, OpenAI locks in massive network effects. New users flock in, existing users upgrade en masse, and ChatGPT’s market pull intensifies. They will spend more to service the use of their most powerful model. GPT-5 is priced at 1ドル.25 per million input tokens and 10ドル per million output tokens, about half the input cost of GPT-4o. The upside is that they will continue to grow the number of new users integrating ChatGPT into their daily lives. – Reid Hoffman

According to OpenAI, model reliability also improved materially. Against open-ended factuality sets such as LongFact and FActScore, GPT-5 models show substantially lower hallucination rates than OpenAI o3 and prior baselines, with gpt-5-thinking producing over five times fewer factual errors in both browse-on and browse-off settings across the three benchmarks. METR’s autonomy review concludes it is unlikely GPT-5 would increase AI R&D by a factor of ten, conduct strategic sandbagging, or achieve rogue replication. The observed 50 percent time-horizon is approximately two hours and fifteen minutes.

Structured outputs have matured. OpenAI’s cookbook shows strict JSON Schema enforcement with a single flag, which makes it realistic to guarantee shape for downstream systems without post-hoc validators and brittle regex fallbacks. This pairs well with function calling for tool use and reduces glue code in extraction, enrichment, and integration pipelines.

For teams building agent workflows, the surrounding ecosystem matters as much as the base model. OpenAI’s Agents SDK provides orchestration and tracing, and it speaks MCP to connect models to tools hosted on remote servers such as CRM, payment, or support systems. Because MCP is an open protocol, you can standardize tool access across vendors or swap models without rewriting every integration. This is important for portability planning and for containing long-run switching costs.

“Good progress on many fronts but still part of the pack, not a giant leap forward.. lots of questions TBD about real-world performance, obviously not AGI.” – Gary Marcus

User response on Reddit has been volatile since the GPT-5 rollout. Threads in r/ChatGPT report disappointment with perceived changes in tone, stricter rate limits, and the removal of older models. Posts titled “GPT5 is horrible” and “GPT-5 launch” drew heavy discussion. Tech press aggregated the reaction and reported that OpenAI restored GPT-4o as an option following the backlash. For teams shipping against the ChatGPT runtime, the episode is a reminder to instrument user sentiment channels, preserve feature flags for model routing, and plan reversibility in rollout playbooks.

Developers looking to learn more can refer to the GPT-5 system card and follow other recent OpenAI coverage on InfoQ.

About the Author

Andrew Hoblitzell

Show moreShow less

This content is in the AI, ML & Data Engineering topic

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

OpenAI’s GPT-5 Debuts with Commoditizing Costs and Higher Scrutiny

Write for InfoQ

About the Author

Andrew Hoblitzell

Rate this Article

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

Related Content

The InfoQ Newsletter