Choosing Erasure Coding Schemes: k, p, and Stripe Geometry

Understanding k and p Parameters
An erasure coding scheme is defined by k (data fragments) and p (parity fragments). A k+p scheme stores k+p total fragments and tolerates losing any p of them. Storage overhead is (k+p)/k. A 10+4 scheme has 14/10 = 1.4x overhead. A 6+3 scheme has 9/6 = 1.5x overhead.
Higher k means more efficient storage but requires more nodes for a single stripe. If you only have 10 servers, a 10+4 scheme requires all servers to participate in every stripe. A 4+2 scheme can stripe across different 6 server subsets, improving fault isolation.
Stripe Size Geometry
Each fragment has a size, and the stripe size equals fragment size times k. A 10+4 scheme with 1MB fragments has 10MB stripes. Objects smaller than 10MB waste space or require padding.
Larger fragments improve encoding efficiency (less overhead per byte) but increase minimum object size and repair bandwidth. Smaller fragments reduce waste for small objects but add per fragment metadata overhead. Common production configurations use 1-4MB fragments.
Matching Scheme to Workload
Archive storage: Maximize k for efficiency. 16+4 gives 1.25x overhead. Objects are large (gigabytes to terabytes), accessed rarely, latency insensitive. Repair bandwidth is acceptable because reads are infrequent.
Hot object storage: Balance k and p. 10+4 or 8+4 provide good efficiency with reasonable repair bandwidth. Objects range from megabytes to gigabytes with frequent reads.
Small cluster: Reduce k to match available nodes. 4+2 or 6+2 work with 6-8 nodes. Lower efficiency but avoids requiring all nodes for every stripe.
Durability Target Calculation
Given p failures tolerated and per disk failure probability f, durability is roughly 1 - C(k+p, p+1) * f^(p+1). For 10+4 with 1% disk failure: losing 5+ of 14 fragments has probability roughly 2000 * (0.01)^5 = 2 * 10^-8. That is eight nines, not eleven. The eleven nines assumes lower disk failure rates and independent failures.
🎯 When To Use: Choose k based on cluster size and efficiency target. Choose p based on durability requirement and failure correlation. Choose fragment size based on typical object size and minimum object overhead tolerance.

💡 Key Takeaways

✓k+p scheme has (k+p)/k storage overhead: 10+4 is 1.4x, 6+3 is 1.5x, 16+4 is 1.25x

✓Higher k increases storage efficiency but requires more nodes per stripe and increases repair bandwidth

✓Stripe size equals fragment size times k; objects smaller than stripe size waste space through padding

✓Archive workloads favor high k (16+4) for efficiency; hot storage balances with 10+4 or 8+4

✓Durability calculation: 10+4 with 1% disk failure gives roughly eight nines, not eleven; eleven nines requires lower failure rates

📌 Interview Tips

1Present scheme selection as optimization problem. Given 20 servers, 10+4 can stripe across 14 with 6 spare. 4+2 can create multiple independent 6 server groups for better fault isolation. Which matters more depends on failure correlation.

2Walk through the efficiency calculation explicitly. 10+4 stores 14 fragments for 10 data units: 1.4x overhead means 40% more storage cost. At petabyte scale, that is $400K/year extra versus 3x replication at $2M/year extra.

3If asked about fragment size, explain the trade off. 4MB fragments mean 40MB minimum stripe for 10+4. Good for video storage, wasteful for a million 100KB JSON files. Suggest bundling small objects or using replication.

← Back to Erasure Coding & Durability Overview