InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Unlock the full InfoQ experience

Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with content, and download exclusive resources.

Don't have an InfoQ account?

Stay updated on topics and peers that matter to youReceive instant alerts on the latest insights and trends.
Quickly access free resources for continuous learningMinibooks, videos with transcripts, and training materials.
Save articles and read at anytimeBookmark articles to read whenever youre ready.

Logo - Back to homepage

News Articles Presentations Podcasts Guides

Topics

Development

Featured in Development

Go Channels: Understanding Happens-Before for Safe Concurrency

This article dives into the happens-before semantics of Go channels, explaining how they relate to memory visibility, synchronization, and concurrency correctness. We'll examine subtle pitfalls, illustrate them with examples, and explore the architectural implications for system designers.

Go Channels: Understanding Happens-Before for Safe Concurrency

All in development

Architecture & Design

Featured in Architecture & Design

If Architectures Could Talk, They’d Quote Your Boss

Software architecture reflects how organizations communicate and make decisions. Failures stem from misaligned incentives, unclear ownership, and structural gaps—not technical flaws. Architects must design not just systems, but the conditions for systems to thrive, using platform thinking to reduce friction and foster autonomy.

If Architectures Could Talk, They’d Quote Your Boss

All in architecture-design

AI Infrastructure

Featured in AI, ML & Data Engineering

Deploy MultiModal RAG Systems with vLLM

Stephen Batifol discusses building and optimizing self-hosted, multimodal RAG systems. He breaks down vector search, nearest neighbor indexes (FLAT, IVF, HNSW), and the critical role of choosing the right embedding model. He then explains vLLM inference optimization (paged attention, quantization) and uses Mistral's Pixtral to detail multimodal large language model architecture.

Deploy MultiModal RAG Systems with vLLM

All in ai-ml-data-eng

Culture & Methods

Featured in Culture & Methods

Creating Impactful Teams Across Diverse Work Environments

Natan Žabkar Nordberg shares actionable strategies for creating impactful teams across diverse work environments, focusing on the link between culture, diversity, and ROI. He discusses how to build trust through early delegation, empower teams with guided autonomy (using improv examples), and improve communication via a "session 0" framework, offering key takeaways for all engineering leaders.

Creating Impactful Teams Across Diverse Work Environments

All in culture-methods

DevOps

Featured in DevOps

From Grassroots to Enterprise: Vanguard's Journey in SRE Transformation

Christina Yakomin shares Vanguard's SRE transformation: from quarterly testing of monoliths to a mature DevOps model with continuous delivery. She explains the SRE coaching hub, self-service tools, and advanced techniques like request-rate autoscaling. She details modern challenges, including region failure game days and testing AI-backed contact centers.

From Grassroots to Enterprise: Vanguard's Journey in SRE Transformation

All in devops

Events

Helpful links

Choose your language

QCon San Francisco 2025

Get proven patterns to de-risk modern architectures. See how engineers scale cloud-native systems, improve observability, and evolve reliable platforms at pace.

Early Bird ends Oct 14.

QCon AI New York 2025

Move beyond AI demos to real engineering impact. Discover how teams embed LLMs, govern models, and scale inference pipelines to accelerate development securely.

Early Bird ends Oct 14.

QCon London 2025

Benchmark your systems against leading engineering teams. See what really works in FinOps, modern Java, and distributed data architectures to balance cost, scale, and reliability.

Early Bird ends Oct 14.

InfoQ Homepage News OpenAI Releases gpt-oss-120b and gpt-oss-20b, Open-Weight Language Models for Local Deployment

AI, ML & Data Engineering

OpenAI Releases gpt-oss-120b and gpt-oss-20b, Open-Weight Language Models for Local Deployment

This item in japanese

Aug 08, 2025 2 min read

Robert Krzaczyński

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

Listen to this article - 0:00

Audio ready to play

0:00

Reading list

OpenAI has released gpt-oss-120b and gpt-oss-20b, two open-weight language models designed for high-performance reasoning, tool use, and efficient deployment. These are the company’s first fully open-weight language models since GPT-2, and are available under the permissive Apache 2.0 license.

The gpt-oss-120b model activates 5.1 billion parameters per token using a mixture-of-experts architecture. It matches or surpasses the proprietary o4-mini on core reasoning benchmarks, while running efficiently on a single 80 GB GPU. The smaller gpt-oss-20b model activates 3.6 billion of its 21 billion parameters and can run on consumer-grade hardware with just 16 GB of memory, making it suitable for on-device inference or rapid iteration without reliance on cloud infrastructure.

Both models support advanced use cases, including chain-of-thought reasoning, tool use, and structured outputs. Developers can configure the model to apply varying levels of reasoning effort, striking a balance between speed and accuracy.

Trained using techniques adapted from OpenAI’s internal o-series models, gpt-oss models use rotary positional embeddings, grouped multi-query attention, and support 128k context lengths. They were evaluated on coding, health, math, and agentic benchmarks, including MMLU, HealthBench, Codeforces, and TauBench, showing strong performance even compared to closed models like o4-mini and GPT-4o.

Source: Open AI Blog

OpenAI released the models without applying direct supervision to their chain-of-thought (CoT) reasoning, enabling researchers to study reasoning traces for potential issues such as bias or misuse.

To assess risk, OpenAI performed worst-case scenario fine-tuning on the models using adversarial data in biology and cybersecurity. Even with strong fine-tuning efforts, the models did not reach high-risk capability levels according to OpenAI’s Preparedness Framework. Findings from external expert reviewers informed the final release. The company has also launched a red teaming challenge with a 500,000ドル prize pool to evaluate the models in real-world conditions further.

The models are available on Hugging Face and several deployment platforms. The 20B model can be run locally with just 16 GB of RAM. As one Reddit user asked:

Can this model be used on a computer without connecting to the internet locally? What is the lowest-powered computer (Altman says ‘high end’) that can run this model?

Another user clarified:

After downloading, you don't need the internet to run it. As for specs: you’ll need something with at least 16GB of RAM (VRAM or system) for the 20B to ‘run’ properly. A MacBook Air with 16GB can run this at tens of tokens per second. A modern GPU hits hundreds+.

Microsoft is also bringing GPU-optimized versions of the 20B model to Windows via ONNX Runtime, making it available through Foundry Local and the AI Toolkit for VS Code.

About the Author

Robert Krzaczyński

Show moreShow less

This content is in the AI, ML & Data Engineering topic

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

OpenAI Releases gpt-oss-120b and gpt-oss-20b, Open-Weight Language Models for Local Deployment

Write for InfoQ

About the Author

Robert Krzaczyński

Rate this Article

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

Related Content

The InfoQ Newsletter