Object Storage & Blob Storage • Erasure Coding & DurabilityMedium⏱️ ~3 min
What is Erasure Coding and How Does It Achieve High Durability?
Erasure Coding (EC) is a data protection technique that splits an object into k data shards and generates p parity shards, creating n = k + p total shards. Using a Maximum Distance Separable (MDS) code like Reed Solomon, you can reconstruct the original object from any k of the n shards. This means you can lose up to p shards without data loss; only when p+1 shards fail simultaneously before repair completes does actual data loss occur.
The storage efficiency advantage is dramatic. A 6+3 configuration requires only 50% overhead (9 total shards for 6 data shards), while a 17+3 scheme like Backblaze uses has just 17.6% overhead (3 parity shards for 17 data shards). Compare this to 3x replication which imposes 200% overhead. Backblaze's public durability model shows their 17+3 scheme with 0.405% annual shard failure rate and 6.5 day replacement window achieves approximately 11 nines durability (99.999999999% per year).
Durability is probabilistic and depends on three key factors: per shard Annual Failure Rate (AFR), repair time window, and failure independence. The "window of vulnerability" is the critical period from when a shard fails until repair completes. During this window, if p additional shards in the same stripe fail, data is lost. ByteByteGo's comparison illustrates this: with 0.81% annual node failure rate, 4+2 EC achieves 11 nines durability while 3x replication only reaches 6 nines.
The tradeoff is that EC increases durability but can reduce availability. During failures, reads must fan out to more nodes and degraded reads require reconstruction from k shards, which is slower than simply reading a replica. For hot, latency sensitive workloads with small objects, 3x replication typically wins. For large objects, cold or warm data, or cost sensitive archives, EC is the default choice.
💡 Key Takeaways
•Erasure Coding splits data into k data shards plus p parity shards; any k of n total shards can reconstruct the original object using MDS codes like Reed Solomon
•Storage overhead is p/k: 17+3 EC has 17.6% overhead versus 200% for 3x replication, reducing capacity costs by 10x or more
•Durability depends on Annual Failure Rate (AFR), repair window, and failure independence; Backblaze achieves 11 nines with 17+3 EC, 0.405% AFR, and 6.5 day repairs
•Data loss only occurs when p+1 shards fail simultaneously before repair completes; the window of vulnerability between failure detection and repair completion is the critical period
•EC trades capacity savings for higher CPU (encoding/decoding), more network fanout (reading k shards), and potentially lower availability during degraded mode
•Choose EC for large objects, cold or warm data, and cost sensitive archives; choose 3x replication for hot, small, latency sensitive workloads
📌 Examples
Backblaze public durability: 17+3 Reed Solomon with 0.405% AFR and 6.5 day repair achieves 11 nines annual durability with 17.6% storage overhead
ByteByteGo comparison: 4+2 EC achieves 11 nines durability while 3x replication achieves only 6 nines, both with 0.81% annual node failure rate
Meta's f4 warm/cold BLOB storage uses Reed Solomon EC for photos at massive scale to cut storage costs versus triplication
Industry object stores use EC across failure domains (racks, Availability Zones) with common schemes like 6+3, 8+4, 10+4 achieving 11+ nines at 25 to 60% overhead