Scalability, Availability & Stability Patterns

General recommendations • Immutability as the default • Referential Transparency (FP) • Laziness • Think about your data: • Different data need different guarantees

Scalability Trade-offs

Trade-offs •Performance vs Scalability •Latency vs Throughput •Availability vs Consistency

Performance vs Scalability

How do I know if I have a performance problem?

How do I know if I have a performance problem? If your system is slow for a single user

How do I know if I have a scalability problem?

How do I know if I have a scalability problem? If your system is fast for a single user but slow under heavy load

Latency vs Throughput

You should strive for maximal throughput with acceptable latency

Availability vs Consistency

Brewer’s CAPtheorem

You can only pick 2 Consistency Availability Partition tolerance At a given point in time

Centralized system • In a centralized system (RDBMS etc.) we don’t have network partitions, e.g. P in CAP • So you get both: •Availability •Consistency

Atomic Consistent Isolated Durable

Distributed system • In a distributed system we (will) have network partitions, e.g. P in CAP • So you get to only pick one: •Availability •Consistency

Basically Available Soft state Eventually consistent

Eventual Consistency ...is an interesting trade-off

Eventual Consistency ...is an interesting trade-off But let’s get back to that later

Availability Patterns

•Fail-over •Replication • Master-Slave • Tree replication • Master-Master • Buddy Replication Availability Patterns

What do we mean with Availability?

Fail-over

Fail-over Copyright Michael Nygaard

Fail-over But fail-over is not always this simple Copyright Michael Nygaard

Fail-over Copyright Michael Nygaard

Fail-back Copyright Michael Nygaard

Network fail-over

Replication

• Active replication - Push • Passive replication - Pull • Data not available, read from peer, then store it locally • Works well with timeout-based caches Replication

• Master-Slave replication • Tree Replication • Master-Master replication • Buddy replication Replication

Master-Slave Replication

Tree Replication

Master-Master Replication

Buddy Replication

Scalability Patterns: State

•Partitioning •HTTP Caching •RDBMS Sharding •NOSQL •Distributed Caching •Data Grids •Concurrency Scalability Patterns: State

Partitioning

HTTP Caching Reverse Proxy • Varnish • Squid • rack-cache • Pound • Nginx • Apache mod_proxy • Traffic Server

HTTP Caching CDN,Akamai

Generate Static Content Precompute content • Homegrown + cron or Quartz • Spring Batch • Gearman • Hadoop • Google Data Protocol • Amazon Elastic MapReduce

HTTP Caching First request

HTTP Caching Subsequent request

Service of Record SoR

Service of Record •Relational Databases (RDBMS) •NOSQL Databases

How to scale out RDBMS?

Sharding •Partitioning •Replication

Sharding: Partitioning

Sharding: Replication

ORM + rich domain model anti-pattern •Attempt: • Read an object from DB •Result: • You sit with your whole database in your lap

Think about your data • When do you need ACID? • When is Eventually Consistent a better fit? • Different kinds of data has different needs Think again

When is a RDBMS not good enough?

Scaling reads to a RDBMS is hard

Scaling writes to a RDBMS is impossible

Do we really need a RDBMS?

Do we really need a RDBMS? Sometimes...

Do we really need a RDBMS?

Do we really need a RDBMS? But many times we don’t

NOSQL (Not Only SQL)

•Key-Value databases •Column databases •Document databases •Graph databases •Datastructure databases NOSQL

Who’s ACID? • Relational DBs (MySQL, Oracle, Postgres) • Object DBs (Gemstone, db4o) • Clustering products (Coherence, Terracotta) • Most caching products (ehcache)

Who’s BASE? Distributed databases • Cassandra • Riak • Voldemort • Dynomite, • SimpleDB • etc.

• Google: Bigtable • Amazon: Dynamo • Amazon: SimpleDB • Yahoo: HBase • Facebook: Cassandra • LinkedIn: Voldemort NOSQL in the wild

But first some background...

• Distributed Hash Tables (DHT) • Scalable • Partitioned • Fault-tolerant • Decentralized • Peer to peer • Popularized • Node ring • Consistent Hashing Chord & Pastry

Node ring with Consistent Hashing Find data in log(N) jumps

Distributed Caching

•Write-through •Write-behind •Eviction Policies •Replication •Peer-To-Peer (P2P) Distributed Caching

Write-through

Write-behind

Eviction policies • TTL (time to live) • Bounded FIFO (first in first out) • Bounded LIFO (last in first out) • Explicit cache invalidation

Peer-To-Peer • Decentralized • No "special" or "blessed" nodes • Nodes can join and leave as they please

•EHCache •JBoss Cache •OSCache •memcached Distributed Caching Products

memcached • Very fast • Simple • Key-Value (string -‐> binary) • Clients for most languages • Distributed • Not replicated - so 1/N chance for local access in cluster

Data Grids / Clustering

Data Grids/Clustering Parallel data storage • Data replication • Data partitioning • Continuous availability • Data invalidation • Fail-over • C + P in CAP

Data Grids/Clustering Products • Coherence • Terracotta • GigaSpaces • GemStone • Tibco Active Matrix • Hazelcast

Concurrency

•Shared-State Concurrency •Message-Passing Concurrency •Dataflow Concurrency •Software Transactional Memory Concurrency

Shared-State Concurrency

•Everyone can access anything anytime •Totally indeterministic •Introduce determinism at well-defined places... •...using locks Shared-State Concurrency

Message-Passing Concurrency

•Originates in a 1973 paper by Carl Hewitt •Implemented in Erlang, Occam, Oz •Encapsulates state and behavior •Closer to the definition of OO than classes Actors

• Easier to reason about • Raised abstraction level • Easier to avoid –Race conditions –Deadlocks –Starvation –Live locks Actors

Dataflow Concurrency

STM: Software Transactional Memory

• Transactions can nest • Transactions compose (yipee!!) atomic { ... atomic { ... } } STM: overview

All operations in scope of a transaction: l Need to be idempotent STM: restrictions

• Akka (Java/Scala) • Multiverse (Java) • Clojure STM (Clojure) • CCSTM (Scala) • Deuce STM (Java) STM libs for the JVM

Scalability Patterns: Behavior

•Event-Driven Architecture •Compute Grids •Load-balancing •Parallel Computing Scalability Patterns: Behavior

Domain Events "It's really become clear to me in the last couple of years that we need a new building block and that is the Domain Events" -- Eric Evans, 2009

Domain Events "State transitions are an important part of our problem space and should be modeled within our domain." -- GregYoung, 2008

Event Sourcing

"A single model cannot be appropriate for reporting, searching and transactional behavior." -- GregYoung, 2008 Command and Query Responsibility Segregation (CQRS) pattern

Bidirectional Bidirectional

UnidirectionalUnidirectional Unidirectional

CQRS Copyright by Axis Framework

Event Stream Processing select * from Withdrawal(amount>=200).win:length(5)

Event Stream Processing Products • Esper (Open Source) • StreamBase • RuleCast

Messaging • Publish-Subscribe • Point-to-Point • Store-forward • Request-Reply

Publish-Subscribe

Point-to-Point

Store-Forward Durability, event log, auditing etc.

Request-Reply F.e.AMQP’s ‘replyTo’ header

Messaging • Standards: • AMQP • JMS • Products: • RabbitMQ (AMQP) • ActiveMQ (JMS) • Tibco • MQSeries • etc

ESB

ESB products • ServiceMix (Open Source) • Mule (Open Source) • Open ESB (Open Source) • Sonic ESB • WebSphere ESB • Oracle ESB • Tibco • BizTalk Server

Actors • Fire-forget • Async send • Fire-And-Receive-Eventually • Async send + wait on Future for reply

Enterprise Integration Patterns

Enterprise Integration Patterns Apache Camel • More than 80 endpoints • XML (Spring) DSL • Scala DSL

Compute Grids

Compute Grids Parallel execution • Divide and conquer 1. Split up job in independent tasks 2. Execute tasks in parallel 3. Aggregate and return result • MapReduce - Master/Worker

Compute Grids Parallel execution • Automatic provisioning • Load balancing • Fail-over • Topology resolution

Compute Grids Products • Platform • DataSynapse • Google MapReduce • Hadoop • GigaSpaces • GridGain

Load balancing

• Random allocation • Round robin allocation • Weighted allocation • Dynamic load balancing • Least connections • Least server CPU • etc. Load balancing

Load balancing • DNS Round Robin (simplest) • Ask DNS for IP for host • Get a new IP every time • Reverse Proxy (better) • Hardware Load Balancing

Load balancing products • Reverse Proxies: • Apache mod_proxy (OSS) • HAProxy (OSS) • Squid (OSS) • Nginx (OSS) • Hardware Load Balancers: • BIG-IP • Cisco

Parallel Computing

Master/Worker

What if task creation can’t be handled by: • parallelizing loops (Loop Parallelism) • putting them on work queues (Master/Worker)

What if task creation can’t be handled by: • parallelizing loops (Loop Parallelism) • putting them on work queues (Master/Worker) Enter Fork/Join

Java 7 ParallelArray (Fork/Join DSL) Fork/Join

Java 7 ParallelArray (Fork/Join DSL) ParallelArray students = new ParallelArray(fjPool, data); double bestGpa = students.withFilter(isSenior) .withMapping(selectGpa) .max(); Fork/Join

• Hadoop (OSS), used @Yahoo • Amazon Elastic MapReduce • Many NOSQL DBs utilizes it for searching/querying MapReduce Products

MapReduce

Parallel Computing products • MPI • OpenMP • JSR166 Fork/Join • java.util.concurrent • ExecutorService, BlockingQueue etc. • ProActive Parallel Suite • CommonJ WorkManager (JEE)

Stability Patterns

•Timeouts •Circuit Breaker •Let-it-crash •Fail fast •Bulkheads •Steady State •Throttling Stability Patterns

Circuit Breaker

Restart Strategy OneForOne

Restart Strategy AllForOne

Supervisor Hierarchies

Bulkheads

Bulkheads • Partition and tolerate failure in one part • Redundancy • Applies to threads as well: • One pool for admin tasks to be able to perform tasks even though all threads are blocked

Steady State • Clean up after you • Logging: • RollingFileAppender (log4j) • logrotate (Unix) • Scribe - server for aggregating streaming log data • Always put logs on separate disk

Throttling • Maintain a steady pace • Count requests • If limit reached, back-off (drop, raise error) • Queue requests • Used in for example Staged Event-Driven Architecture (SEDA)

thanks for listening

Extra material

Client-side consistency • Strong consistency • Weak consistency • Eventually consistent • Never consistent

Server-side consistency W + R > N strong consistency W + R <= N eventual consistency

Change Language

Scalability, Availability & Stability Patterns

More Related Content

What's hot

Viewers also liked

Similar to Scalability, Availability & Stability Patterns

More from Jonas Bonér

Recently uploaded

In this document

Scalability, Availability & Stability Patterns