InfoQ Homepage News Discord Migrates Trillions of Messages from Cassandra to ScyllaDB
Discord Migrates Trillions of Messages from Cassandra to ScyllaDB
This item in japanese
Jun 22, 2023 2 min read
Write for InfoQ
Feed your curiosity. Help 550k+ globalsenior developers
each month stay ahead.Get in touch
Discord has migrated trillions of message records from Apache Cassandra to ScyllaDB, reducing the size of the largest cluster from 177 Cassandra nodes to 72 ScyllaDB nodes and reducing tail latencies for reads and writes. The move has unlocked new product use cases because of the improved database stability and performance.
As Discord grew, it migrated its data from MongoDB to Cassandra in 2017 because it was looking for a scalable database to handle ever-growing data volumes. Initially, the Cassandra cluster consisted of 12 nodes and stored billions of messages. Still, after five years, the cluster had 177 nodes and was frequently experiencing performance problems, forcing the team to reduce some of the maintenance operations, which became too expensive to run.
Some performance issues were caused by hot partitions resulting from the table schema design, where partitioning was based on the Discord channel and time bucket. Bo Ingram, a senior software engineer at Discord, explains the impact of hot partitions on the database cluster:
When we encountered a hot partition, it frequently affected latency across our entire database cluster. One channel and bucket pair received a large amount of traffic, and latency in the node would increase as the node tried harder and harder to serve traffic and fell further and further behind. [...] Since we perform reads and writes with a quorum consistency level, all queries to the nodes that serve the hot partition suffer latency increases, resulting in a broader end-user impact.
Based on the experimenting and testing done internally, the team has decided to move its data across all clusters to ScyllaDB. They opted for ScyllaDB primarily to improve performance, including avoiding garbage-collection-related issues they experienced with Cassandra. They also worked with the ScyllaDB team to improve some use cases they depended on, like reverse queries.
After migrating all smaller clusters by 2020, the team prepared to migrate the biggest cluster, containing trillions of messages. To minimize the hot partition problem, they created a new intermediary service layer in their architecture, named data services, written in Rust and interfaced via gRPC API.
One important responsibility of data services is request coalescing, which avoids multiple database calls when many users request the same message. Secondly, the team implemented consistent hash-based routing to data service instances based on a routing key, such as a channel id. Together, these changes significantly reduced hot partition problems, giving Discord extra time to prepare for the big migration.
Source: https://discord.com/blog/how-discord-stores-trillions-of-messages
For the migration itself, the team first considered using ScyllaDB’s Apache Spark migrator but, in the end, decided to implement a bespoke solution in Rust with SQLite used for checkpointing, which allowed them to shorten the migration time from three months to nine days. After addressing some minor hiccups, they validated the completed migration and switched to ScyllaDB in May 2022. Since then, the new cluster has proven stable and provided consistent performance, which, together with the data service layer, allowed it to handle extra traffic generated by the World Cup gracefully.
This content is in the Performance topic
Related Topics:
-
Related Editorial
-
Related Sponsors
-
Popular across InfoQ
-
TanStack Start: A New Meta Framework Powered by React or SolidJS
-
Microsoft Patches Critical ASP.NET Core Vulnerability with 9.9 Severity Score
-
GitHub Expands Copilot Ecosystem with AgentHQ
-
Redis Critical Remote Code Execution Vulnerability Discovered after 13 Years
-
Monzo’s Real-Time Fraud Detection Architecture with BigQuery and Microservices
-
Architecture Should Model the World as It Really Is: A Conversation with Randy Shoup
-
Related Content
The InfoQ Newsletter
A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example