InfoQ Homepage Performance & Scalability Content on InfoQ
-
Analyzing Apache Kafka Stretch Clusters: WAN Disruptions, Failure Scenarios, and DR Strategies
Proficient in analyzing the dynamics of Apache Kafka Stretch Clusters, I assess WAN disruptions and devise effective Disaster Recovery (DR) strategies. With deep expertise, I ensure high availability and data integrity across multi-region deployments. My insights optimize operational resilience, safeguarding vital services against service level agreement violations.
on Jun 20, 2025 -
Designing Resilient Event-Driven Systems at Scale
Learn how to design resilient event-driven systems that scale. Explore key patterns like shuffle sharding and decoupling queues to handle load spikes and failures. Understand common pitfalls like over-relying on retries and neglecting observability for robust, scalable architectures.
on May 30, 2025 -
Transforming Legacy Healthcare Systems: a Journey to Cloud-Native Architecture
Discover how Livi navigated the complexities of transitioning MJog, a legacy healthcare system, to a cloud-native architecture, sharing valuable insights for successful tech modernization. Our experience illustrates that transitioning from legacy systems to cloud-based microservices is not a one-time project, but an ongoing journey.
on Nov 18, 2024 -
How Netflix Ensures Highly-Reliable Online Stateful Systems
Building reliable stateful services at scale isn’t a matter of building reliability into the servers, the clients, or the APIs in isolation. By combining smart and meaningful choices for each of these three components, we can build massively scalable, SLO-compliant stateful services at Netflix.
on May 14, 2024 -
Magic Pocket: Dropbox’s Exabyte-Scale Blob Storage System
A horizontally scalable exabyte-scale blob storage system which operates out of multiple regions, Magic Pocket is used to store all of Dropbox’s data. Adopting SMR technology and erasure codes, the system has extremely high durability guarantees but is cheaper than operating in the cloud.
on May 15, 2023 -
Design Pattern Proposal for Autoscaling Stateful Systems
In this article, Rogerio Robetti discusses the challenges in auto-scaling stateful storage systems and proposes an opinionated design solution to automatically scale up (vertical) and scale out (horizontal) from a single node up to several nodes in a cluster with minimum configuration and interference of the operator.
on Jan 25, 2023 -
A Recipe to Migrate and Scale Monoliths in the Cloud
In this article, I want to present a simple cloud architecture that can allow an organization to take monolithic applications to the cloud incrementally without a dramatic change in the architecture. We will discuss the minimal requirements and basic components to take advantage of the scalability of the cloud.
on May 13, 2022 -
Using the Plan-Do-Check-Act Framework to Produce Performant and Highly Available Systems
The PDCA (plan-do-check-act) framework can be used to outline the performance, availability, and monitoring to enable teams to ensure performant and highly available applications. These include infrastructure design and setup, application architecture and design, coding, performance testing, and application monitoring.
on Jun 09, 2021 -
Donkey: a Highly-Performant HTTP Stack for Clojure
Donkey is the product of the quest for a highly performant Clojure HTTP stack aimed to scale at the rapid pace of growth we have been experiencing at AppsFlyer, and save us computing costs. In this article, we’ll briefly outline the use-case for a library like Donkey and present our benchmarks. Finally, we will discuss Clojure and immutability, and some of our design decisions.
on Jan 12, 2021 -
Four Techniques Serverless Platforms Use to Balance Performance and Cost
There are two aspects that have been key to the rapid adoption of serverless computing: the performance and the cost model. This article looks at those aspects, the tradeoffs, and opportunity ahead.
on Feb 13, 2019 -
Scaling a Distributed Stream Processor in a Containerized Environment
The article presents our experience of scaling a distributed stream processor in Kubernetes. The stream processor should provide support for maintaining the optimal level of parallelism. However, adding more resources incurs additional cost and also it does not guarantee performance improvements. Instead, the stream processor should identify the level of resource requirement and scale accordingly.
on Jan 13, 2019 -
Columnar Databases and Vectorization
In this article, author Siddharth Teotia discusses the Dremio database which is based on Apache Arrow with vectorization capabilities.
on May 27, 2018