[フレーム]
BT

InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

View an example

We protect your privacy.

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Unlock the full InfoQ experience

Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with content, and download exclusive resources.

Log In
or

Don't have an InfoQ account?

Register
  • Stay updated on topics and peers that matter to youReceive instant alerts on the latest insights and trends.
  • Quickly access free resources for continuous learningMinibooks, videos with transcripts, and training materials.
  • Save articles and read at anytimeBookmark articles to read whenever youre ready.

Topics

Choose your language

InfoQ Homepage News Expedia Uses WebSockets and Kafka to Query Near Real-Time Streaming Data

Expedia Uses WebSockets and Kafka to Query Near Real-Time Streaming Data

This item in japanese

Dec 19, 2023 2 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.
Get in touch

Expedia created a solution to support querying the clickstream data from their platform in near-real time to enable their product and engineering teams to explore live data while working on new and enhancing existing data-driven functional use cases. The team used a combination of WebSockets, Apache Kafka, and PostgreSQL to allow streaming query results continuously to users’ browsers.

Expedia generates vast amounts of data from many sources, including website interactions. The clickstream data collected while users navigate the website or interact with elements within web pages can provide precious insights into user behavior. Ryan Lacerna, data engineer at Expedia Group (currently at Personio), explains the benefits of querying in near real-time:

To help ensure data quality, one of the challenges is to see the data as soon as it is injected into the pipeline. Traditional methods, such as querying data lakes and data warehouses, take time to process. However, [...] implementing an event-driven tool [...] enables the user to query and view streaming data quickly and efficiently, providing quick feedback to help data producers act and for data consumers to understand what data are being captured for their use cases.

The team opted to leverage WebSockets for bidirectional real-time communication between a web browser and the server. The advantage of using WebSockets is that it can avoid constantly refreshing the server's data. Additionally, WebSockets are based on a single long-lived TCP connection and improve performance while minimizing resource overheads.

The Architecture of the Near Real-Time Querying Solution (Source: Expedia Engineering Blog)

The solution's architecture includes the UI application, the WebSocket handler, and the Filter Worker and uses Apache Kafka topics and the PostgreSQL database. The UI provides a simple query form, allowing the user to specify the type of clickstream event to display and a widget to display query results sent via WebSocket. The UI app uses the SockJS library and STOMP protocol to interact with the server.

On the server side, the WebSocket Handler service handles queries expressed as STOMP messages and delivers streaming results back to the browser. The Handler consumes filtered clickstream events from the Apache Kafka topic. The Filter Worker is responsible for publishing the streams of filtered events based on active queries into the Kafka topic to which the WebSocket handler is subscribed. Both services run with multiple replicas in Kubernetes to achieve scalability.

Services use the PostgreSQL database to synchronize active query details, which include the filtering criteria for clickstream events. The WebSocket Handler persists query filters to the database table and removes them if the user disconnects from the session or the TTL (time-to-live) expires in case of lingering user sessions. The solution relies on the LISTEN/NOTIFY functionality of Postgres to ensure that the Filter Worker keeps its in-memory cache up-to-date based on changes in the database.

The Routing of Filtered Events to Users (Source: Expedia Engineering Blog)

The Filter Worker service significantly reduces the number of events published to the filtered topic compared to the source topic. Messages published to the filtered topic are keyed with the filter ID, which in turn is used by the WebSocket Handler to route messages to the correct user. This approach also allows scaling the WebSocket layer to handle the load as the number of tool users grows.

About the Author

Rafal Gancarz

Show moreShow less

Rate this Article

Adoption
Style

Related Content

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

BT

AltStyle によって変換されたページ (->オリジナル) /