InfoQ Homepage News Grab Adds Real-Time Data Quality Monitoring to Its Platform
Grab Adds Real-Time Data Quality Monitoring to Its Platform
Dec 05, 2025 2 min read
Write for InfoQ
Feed your curiosity. Help 550k+ globalsenior developers
each month stay ahead.Get in touch
Grab, a Singapore-based digital service delivery platform, added data quality monitoring to their Coban internal platform to improve the quality of data delivered by Apache Kafka to downstream consumers. The changes are described in the company’s engineering blog. "In the past, monitoring Kafka stream data processing lacked an effective solution for data quality validation," the team stated. "This limitation made it challenging to identify bad data, notify users in a timely manner, and prevent the cascading impact on downstream users from further escalating."
The errors experienced by Grab came in two main types: syntactic and semantic. Syntactic issues are caused by errors in message structure. For example, a producer might send a string value for a field defined in the schema as an int, causing consumer applications to crash with deserialization errors. Semantic errors arise when the data values in the message are badly structured or are outside of acceptable limits. A user_id field might be a valid string (syntactically correct) but violate a semantic rule if it does not conform to the expected company-wide format of 'usr-{8-digits}'.
To solve this, the Grab engineering team implemented a new architecture supporting data contract definition, automated testing, and data quality alerts. The core of this system is a test configuration and transformation engine.
This engine takes topic data schemas, metadata, and test rules as inputs to create a set of FlinkSQL-based test definitions. A Flink job then executes these tests, consuming messages from production Kafka topics and forwarding any errors to Grab's observability platform. FlinkSQL was selected because its ability to represent stream data as dynamic tables allowed the team to automatically generate data filters for rules that could be efficiently implemented.
To simplify what could be an overwhelming task of defining hundreds of field-specific rules, the platform uses an LLM to analyze Kafka stream schemas and anonymized sample data to recommend potential semantic test rules. This feature dramatically accelerates the setup process and helps users identify non-obvious data quality constraints.
Deployed earlier this year, the system now actively monitors data quality across 100+ critical Kafka topics. The team reported that "the solution offers the capability to immediately identify and halt the propagation of invalid data across multiple streams… This accelerates the process of diagnosing and resolving issues, allowing users to swiftly address production data challenges."
This approach aligns with industry best practices, which are still rare. According to the recent 2025 Data Streaming Report by Confluent, only an estimated 1% of companies have reached the highest maturity level, where "data streaming is a strategic enabler with streams managed as a product." By implementing proactive, contract-based data quality monitoring, Grab is treating its data streams as a reliable product for its internal users.
Grab's platform enhancement is part of a broader industry trend towards adding observability to data pipelines, a space seeing activity from new startups and academic research into real-time data quality metrics.
This content is in the Apache Kafka topic
Related Topics:
-
Related Editorial
-
Related Sponsors
-
Popular across InfoQ
-
Reddit Migrates Comment Backend from Python to Go Microservice to Halve Latency
-
Kubernetes Community Retires Popular Ingress NGINX Controller
-
Helm Improves Kubernetes Package Management with Biggest Release in 6 Years
-
Cloudflare Introduces Remote Bindings for Local Development
-
Google Launches Agent Development Kit for Go
-
How to Use Apache Spark to Craft a Multi-Year Data Regression Testing and Simulations Framework
-
Related Content
The InfoQ Newsletter
A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example