Database DesignKey-Value Stores (Redis, DynamoDB)Medium⏱️ ~3 min

Data Partitioning and Replication in Key Value Stores

At cluster scale, key value systems partition data by hashing the key and replicate partitions to multiple nodes for availability. The hash function determines which partition owns a key, typically using modulo sharding or consistent hashing with virtual nodes. Consistent hashing minimizes data movement when nodes join or leave: each physical node manages 100 to 256 virtual nodes spread around a hash ring, so losing one physical node redistributes only 1 over N of the keyspace to remaining nodes. Replication is usually asynchronous for low latency. Systems seeking stronger guarantees use quorums with N replicas, R reads, and W writes. Requiring R plus W greater than N ensures consistent reads because read and write quorums overlap on at least one node with the latest data. A common production configuration is N=3, R=2, W=2 which tolerates one node failure while maintaining consistency at the cost of higher write latency (must wait for two acknowledgments). Setting R=1 and W=1 gives lowest latency but only eventual consistency. Dynamo style systems use sloppy quorum and hinted handoff to remain available during failures. When a primary replica is down, writes go to a fallback node with a hint to forward the data once the primary recovers. This maintains write availability but creates temporary inconsistency. Anti entropy processes using Merkle trees periodically scan and repair divergent replicas. Twitter Manhattan employs automatic sharding with multi datacenter placement and background repair to serve millions of requests per second. Request routing requires keeping a partition map. Client side routing lets applications hash keys and connect directly to the right node, avoiding extra hops. Server side routing uses stateless proxy layers like Twemproxy or Mcrouter that maintain the routing map and forward requests. The routing map updates via gossip protocols or coordination services when nodes join, leave, or fail. Meta's memcache uses proxy layers for client side sharding achieving sub millisecond regional pool access.
💡 Key Takeaways
Consistent hashing with virtual nodes (100 to 256 per physical node) minimizes data movement when nodes join or leave, redistributing only 1 over N of keyspace per node change
Quorum replication with N=3, R=2, W=2 ensures consistency (R plus W greater than N guarantees overlap) but increases write latency; R=1, W=1 gives lowest latency with eventual consistency
Sloppy quorum and hinted handoff maintain write availability during node failures by using fallback nodes, then repairing via Merkle tree based anti entropy processes
Client side routing avoids extra network hops by hashing keys directly to nodes; server side proxies like Mcrouter or Twemproxy simplify clients but add one hop latency
DynamoDB adaptive capacity automatically redistributes throughput from cold to hot partitions; without it, per partition limits can throttle even when table level capacity remains available
📌 Examples
Amazon Dynamo uses N=3 replicas with R=W=2 for shopping cart, maintaining 99.9th percentile latencies under 300ms during node failures via sloppy quorum and hinted handoff
Meta memcache proxy layer (Mcrouter) routes to regional pools with lease tokens to prevent thundering herds, achieving over 90% hit rates and sub millisecond access
Twitter Manhattan automatic sharding places trillions of entities across nodes with background repair handling millions of requests per second and multi datacenter replication
← Back to Key-Value Stores (Redis, DynamoDB) Overview
Data Partitioning and Replication in Key Value Stores | Key-Value Stores (Redis, DynamoDB) - System Overflow