InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Unlock the full InfoQ experience

Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with content, and download exclusive resources.

Don't have an InfoQ account?

Stay updated on topics and peers that matter to youReceive instant alerts on the latest insights and trends.
Quickly access free resources for continuous learningMinibooks, videos with transcripts, and training materials.
Save articles and read at anytimeBookmark articles to read whenever youre ready.

Logo - Back to homepage

News Articles Presentations Podcasts Guides

Topics

Development

Featured in Development

Go Channels: Understanding Happens-Before for Safe Concurrency

This article dives into the happens-before semantics of Go channels, explaining how they relate to memory visibility, synchronization, and concurrency correctness. We'll examine subtle pitfalls, illustrate them with examples, and explore the architectural implications for system designers.

Go Channels: Understanding Happens-Before for Safe Concurrency

All in development

Architecture & Design

Featured in Architecture & Design

If Architectures Could Talk, They’d Quote Your Boss

Software architecture reflects how organizations communicate and make decisions. Failures stem from misaligned incentives, unclear ownership, and structural gaps—not technical flaws. Architects must design not just systems, but the conditions for systems to thrive, using platform thinking to reduce friction and foster autonomy.

If Architectures Could Talk, They’d Quote Your Boss

All in architecture-design

AI Infrastructure

Featured in AI, ML & Data Engineering

Deploy MultiModal RAG Systems with vLLM

Stephen Batifol discusses building and optimizing self-hosted, multimodal RAG systems. He breaks down vector search, nearest neighbor indexes (FLAT, IVF, HNSW), and the critical role of choosing the right embedding model. He then explains vLLM inference optimization (paged attention, quantization) and uses Mistral's Pixtral to detail multimodal large language model architecture.

Deploy MultiModal RAG Systems with vLLM

All in ai-ml-data-eng

Culture & Methods

Featured in Culture & Methods

Systems Thinking for Scaling Responsible Multi-Agent Architectures

Nimisha Asthagiri explains the critical need for responsible AI in complex multi-agent systems. She shares practical techniques for engineering leaders and architects, applying systems thinking and Causal Flow Diagrams. She shows how these methods help predict and mitigate the unintended consequences and structural risks inherent in autonomous, learning agents, using a scheduler agent example.

Systems Thinking for Scaling Responsible Multi-Agent Architectures

All in culture-methods

DevOps

Featured in DevOps

From Grassroots to Enterprise: Vanguard's Journey in SRE Transformation

Christina Yakomin shares Vanguard's SRE transformation: from quarterly testing of monoliths to a mature DevOps model with continuous delivery. She explains the SRE coaching hub, self-service tools, and advanced techniques like request-rate autoscaling. She details modern challenges, including region failure game days and testing AI-backed contact centers.

From Grassroots to Enterprise: Vanguard's Journey in SRE Transformation

All in devops

Events

Helpful links

Choose your language

QCon San Francisco 2025

Get proven patterns to de-risk modern architectures. See how engineers scale cloud-native systems, improve observability, and evolve reliable platforms at pace.

Early Bird ends Oct 14.

QCon AI New York 2025

Move beyond AI demos to real engineering impact. Discover how teams embed LLMs, govern models, and scale inference pipelines to accelerate development securely.

Early Bird ends Oct 14.

QCon London 2025

Benchmark your systems against leading engineering teams. See what really works in FinOps, modern Java, and distributed data architectures to balance cost, scale, and reliability.

Early Bird ends Oct 14.

InfoQ Homepage News Google Launched LangExtract, a Python Library for Structured Data Extraction from Unstructured Text

AI, ML & Data Engineering

Google Launched LangExtract, a Python Library for Structured Data Extraction from Unstructured Text

Aug 08, 2025 2 min read

Daniel Dominguez

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

Listen to this article - 0:00

Audio ready to play

0:00

Reading list

Google has introduced LangExtract, an open-source Python library designed to help developers extract structured information from unstructured text using large language models such as the Gemini models. The library simplifies the process of converting free-form text, including documents like clinical notes, legal texts, and customer feedback, into structured data. Developers can define extraction tasks through natural language instructions and example data, making it easier to process and organize information from various types of unstructured content.

One of LangExtract’s standout features is its use of controlled generation techniques. This ensures that the extracted information is consistently formatted and accurately linked to its original source in the text. The library highlights relevant spans of text, providing traceability so that each extracted entity is linked to its exact location in the original document. This feature ensures greater transparency and reliability when extracting information.

To handle long and complex documents, LangExtract incorporates advanced strategies like text chunking, parallel processing, and multiple extraction passes. These techniques help improve recall and accuracy, ensuring that the library can effectively extract information from large bodies of text while maintaining high-quality results. This makes LangExtract suitable for applications in various domains, from healthcare to legal documents, without the need for extensive fine-tuning of the underlying models.

LangExtract can be integrated with various LLMs, including cloud-based models like Gemini and local models via platforms such as Ollama. This flexibility makes it a versatile tool for developers working across different models. It enables users to define extraction tasks for a wide range of applications without requiring deep expertise in machine learning.

The release of LangExtract, has sparked enthusiastic responses within the developer community. Akshay Goel, a key contributor, expressed his excitement about the release and eagerness to see innovative applications from users, reflecting the collaborative spirit behind the project, posting:

Excited to release LangExtract alongside the team today and looking forward to seeing what the developer community builds with it!

Developer Kyle Brown described it as a major step forward in AI transparency, converting unstructured text into structured, understandable data. Adding to the momentum a TypeScript port of LangExtract, broadening its compatibility to support both OpenAI models and Google’s Gemini, demonstrating the community's active involvement.

For anyone who is interested -- I ported this to typescript and added an ability to use OpenAI not just Gemini.

The library is available under the Apache 2.0 license and can be easily installed via pip. It offers an accessible and powerful tool for developers looking to add information extraction capabilities to their applications.

About the Author

Daniel Dominguez

Show moreShow less

This content is in the AI, ML & Data Engineering topic

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

Google Launched LangExtract, a Python Library for Structured Data Extraction from Unstructured Text

Write for InfoQ

About the Author

Daniel Dominguez

Rate this Article

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

Related Content

The InfoQ Newsletter