InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Unlock the full InfoQ experience

Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with content, and download exclusive resources.

Don't have an InfoQ account?

Stay updated on topics and peers that matter to youReceive instant alerts on the latest insights and trends.
Quickly access free resources for continuous learningMinibooks, videos with transcripts, and training materials.
Save articles and read at anytimeBookmark articles to read whenever youre ready.

Logo - Back to homepage

News Articles Presentations Podcasts Guides

Topics

Development

Featured in Development

Developing Meta's Orion AR Glasses

Jinsong Yu shares deep architectural insights into the Orion AR glasses, detailing the use of 11 custom microcontrollers for thermal dissipation, the SLAM/VIO needed for world-locked rendering, and input fusion (EMG, eye/hand tracking). He concludes with critical lessons for technical leaders on setting direction, managing complexity through testing, and strategic hardware-software co-design.

Developing Meta's Orion AR Glasses

All in development

Architecture & Design

Featured in Architecture & Design

Building Resilient Platforms: Insights from over Twenty Years in Mission-Critical Infrastructure

Building resilient platforms requires understanding the art and science of creating infrastructure that others depend on for critical applications. This perspective applies to anyone who builds software consumed by others at scale. Whether developing infrastructure platforms, software development platforms, or messaging systems, principles address how to build software that others consume at scale

Building Resilient Platforms: Insights from over Twenty Years in Mission-Critical Infrastructure

All in architecture-design

AI Infrastructure

Featured in AI, ML & Data Engineering

Reducing False Positives in Retrieval-Augmented Generation (RAG) Semantic Caching: A Banking Case Study

In this article, author Elakkiya Daivam discusses why Retrieval Augmented Generation (RAG) and semantic caching techniques are powerful levers for reducing false positives in AI powered applications. She shares the insights from a production-grade evaluation with 1,000 query variations tested across seven bi-encoder models.

Reducing False Positives in Retrieval-Augmented Generation (RAG) Semantic Caching: A Banking Case Study

All in ai-ml-data-eng

Culture & Methods

Featured in Culture & Methods

AI Amplifies Team Strengths and Weaknesses in Software Development

In this podcast, Shane Hastie, Lead Editor for Culture & Methods, spoke to Jon Kern and Anita Zbieg about how AI amplifies both delivery efficiency and weaknesses in development teams, the importance of fundamental collaboration practices, and maintaining holistic system thinking.

AI Amplifies Team Strengths and Weaknesses in Software Development

All in culture-methods

DevOps

Featured in DevOps

When Reverse Proxies Surprise You: Hard Lessons from Operating at Scale

Operating massive reverse proxy fleets reveals hard lessons: optimizations that work on smaller systems fail at scale; mundane oversights like missing commas cause major outages; and abstractions meant to simplify become hidden fragility points. Success requires profiling on target hardware, relentlessly monitoring boring details, keeping hot paths lean, and trusting instrumentation over theory.

When Reverse Proxies Surprise You: Hard Lessons from Operating at Scale

All in devops

Events

Helpful links

Choose your language

QCon San Francisco 2025

Learn how senior teams scaled GenAI search, migrated ML systems, & built global streaming platforms.

Register now.

QCon AI New York 2025

Go from AI demos to real engineering impact. Learn to embed LLMs, govern & scale securely.

Early Bird ends Dec 9.

QCon London 2026

Benchmark against top engineering teams. Learn what works in AI, architecture, data, security & FinTech.

Early Bird ends Dec 9.

QCon AI Boston

Learn how leading engineering teams run AI in production—reliably, securely, and at scale.

Launch pricing ends Dec 9.

InfoQ Homepage News Discord Migrates Trillions of Messages from Cassandra to ScyllaDB

Architecture & Design

Discord Migrates Trillions of Messages from Cassandra to ScyllaDB

This item in japanese

Jun 22, 2023 2 min read

Rafal Gancarz

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

Reading list

Discord has migrated trillions of message records from Apache Cassandra to ScyllaDB, reducing the size of the largest cluster from 177 Cassandra nodes to 72 ScyllaDB nodes and reducing tail latencies for reads and writes. The move has unlocked new product use cases because of the improved database stability and performance.

As Discord grew, it migrated its data from MongoDB to Cassandra in 2017 because it was looking for a scalable database to handle ever-growing data volumes. Initially, the Cassandra cluster consisted of 12 nodes and stored billions of messages. Still, after five years, the cluster had 177 nodes and was frequently experiencing performance problems, forcing the team to reduce some of the maintenance operations, which became too expensive to run.

Some performance issues were caused by hot partitions resulting from the table schema design, where partitioning was based on the Discord channel and time bucket. Bo Ingram, a senior software engineer at Discord, explains the impact of hot partitions on the database cluster:

When we encountered a hot partition, it frequently affected latency across our entire database cluster. One channel and bucket pair received a large amount of traffic, and latency in the node would increase as the node tried harder and harder to serve traffic and fell further and further behind. [...] Since we perform reads and writes with a quorum consistency level, all queries to the nodes that serve the hot partition suffer latency increases, resulting in a broader end-user impact.

Based on the experimenting and testing done internally, the team has decided to move its data across all clusters to ScyllaDB. They opted for ScyllaDB primarily to improve performance, including avoiding garbage-collection-related issues they experienced with Cassandra. They also worked with the ScyllaDB team to improve some use cases they depended on, like reverse queries.

After migrating all smaller clusters by 2020, the team prepared to migrate the biggest cluster, containing trillions of messages. To minimize the hot partition problem, they created a new intermediary service layer in their architecture, named data services, written in Rust and interfaced via gRPC API.

One important responsibility of data services is request coalescing, which avoids multiple database calls when many users request the same message. Secondly, the team implemented consistent hash-based routing to data service instances based on a routing key, such as a channel id. Together, these changes significantly reduced hot partition problems, giving Discord extra time to prepare for the big migration.

Source: https://discord.com/blog/how-discord-stores-trillions-of-messages

For the migration itself, the team first considered using ScyllaDB’s Apache Spark migrator but, in the end, decided to implement a bespoke solution in Rust with SQLite used for checkpointing, which allowed them to shorten the migration time from three months to nine days. After addressing some minor hiccups, they validated the completed migration and switched to ScyllaDB in May 2022. Since then, the new cluster has proven stable and provided consistent performance, which, together with the data service layer, allowed it to handle extra traffic generated by the World Cup gracefully.

About the Author

Rafal Gancarz

Show moreShow less

This content is in the Performance topic

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

Discord Migrates Trillions of Messages from Cassandra to ScyllaDB

Write for InfoQ

About the Author

Rafal Gancarz

Rate this Article

This content is in the Performance topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

Related Content

The InfoQ Newsletter