[フレーム]
BT

InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

View an example

We protect your privacy.

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Unlock the full InfoQ experience

Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with content, and download exclusive resources.

Log In
or

Don't have an InfoQ account?

Register
  • Stay updated on topics and peers that matter to youReceive instant alerts on the latest insights and trends.
  • Quickly access free resources for continuous learningMinibooks, videos with transcripts, and training materials.
  • Save articles and read at anytimeBookmark articles to read whenever youre ready.

Topics

Choose your language

InfoQ Homepage News Grab Adds Real-Time Data Quality Monitoring to Its Platform

Grab Adds Real-Time Data Quality Monitoring to Its Platform

Dec 05, 2025 2 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.
Get in touch
Listen to this article - 0:00
Audio ready to play
0:00
0:00

Grab, a Singapore-based digital service delivery platform, added data quality monitoring to their Coban internal platform to improve the quality of data delivered by Apache Kafka to downstream consumers. The changes are described in the company’s engineering blog. "In the past, monitoring Kafka stream data processing lacked an effective solution for data quality validation," the team stated. "This limitation made it challenging to identify bad data, notify users in a timely manner, and prevent the cascading impact on downstream users from further escalating."

The errors experienced by Grab came in two main types: syntactic and semantic. Syntactic issues are caused by errors in message structure. For example, a producer might send a string value for a field defined in the schema as an int, causing consumer applications to crash with deserialization errors. Semantic errors arise when the data values in the message are badly structured or are outside of acceptable limits. A user_id field might be a valid string (syntactically correct) but violate a semantic rule if it does not conform to the expected company-wide format of 'usr-{8-digits}'.

To solve this, the Grab engineering team implemented a new architecture supporting data contract definition, automated testing, and data quality alerts. The core of this system is a test configuration and transformation engine.

This engine takes topic data schemas, metadata, and test rules as inputs to create a set of FlinkSQL-based test definitions. A Flink job then executes these tests, consuming messages from production Kafka topics and forwarding any errors to Grab's observability platform. FlinkSQL was selected because its ability to represent stream data as dynamic tables allowed the team to automatically generate data filters for rules that could be efficiently implemented.

To simplify what could be an overwhelming task of defining hundreds of field-specific rules, the platform uses an LLM to analyze Kafka stream schemas and anonymized sample data to recommend potential semantic test rules. This feature dramatically accelerates the setup process and helps users identify non-obvious data quality constraints.

Deployed earlier this year, the system now actively monitors data quality across 100+ critical Kafka topics. The team reported that "the solution offers the capability to immediately identify and halt the propagation of invalid data across multiple streams… This accelerates the process of diagnosing and resolving issues, allowing users to swiftly address production data challenges."

This approach aligns with industry best practices, which are still rare. According to the recent 2025 Data Streaming Report by Confluent, only an estimated 1% of companies have reached the highest maturity level, where "data streaming is a strategic enabler with streams managed as a product." By implementing proactive, contract-based data quality monitoring, Grab is treating its data streams as a reliable product for its internal users.

Grab's platform enhancement is part of a broader industry trend towards adding observability to data pipelines, a space seeing activity from new startups and academic research into real-time data quality metrics.

About the Author

Patrick Farry

Show moreShow less

Rate this Article

Adoption
Style

Related Content

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

BT

AltStyle によって変換されたページ (->オリジナル) /