Object Storage & Blob StorageErasure Coding & DurabilityHard⏱️ ~3 min

Local Reconstruction Codes: Reducing Repair Bandwidth

Local Reconstruction Codes (LRCs) extend traditional MDS codes like Reed Solomon to reduce repair bandwidth and time for single shard failures. Standard Reed Solomon requires reading k shards to reconstruct any one missing shard, which generates significant cross cluster or cross datacenter traffic. LRC addresses this by dividing the k data shards into local groups, adding one local parity per group, plus global parity shards. A single shard failure can be repaired by reading only its local group (typically 4 to 6 shards) rather than all k shards. Microsoft Azure Storage pioneered production LRC deployment. For example, an LRC(12,2,2) scheme has 12 data shards divided into 2 local groups of 6 shards each, with 1 local parity per group (2 total) plus 2 global parities, for 16 total shards. Losing one data shard requires reading only its 6 shard local group to reconstruct, rather than reading all 12 data shards as Reed Solomon would. This cuts repair bandwidth by 50% and reduces repair time proportionally. The tradeoff is slightly higher storage overhead: LRC(12,2,2) has 33% overhead versus an equivalent 12+3 Reed Solomon at 25% overhead. Repair time directly impacts durability by controlling the window of vulnerability. If repair bandwidth is limited or cross domain egress is expensive (common in cloud environments with inter AZ or inter region transfer costs), LRC provides better effective durability despite nominally similar fault tolerance. Azure reported that LRC reduced their repair traffic by approximately 50% compared to Reed Solomon, enabling faster repairs and lower blast radius during failures. The local parity groups also provide locality: repairs for single failures stay within a rack or AZ rather than pulling data across the entire cluster. The implementation complexity increases: you must manage both local and global parity computation, track local group membership, and handle the case when local parity itself fails (falling back to global reconstruction). LRC works best for workloads with high repair frequency or expensive cross domain bandwidth. For pure capacity optimization or when repair bandwidth is plentiful, standard Reed Solomon MDS codes remain simpler and achieve lower overhead.
💡 Key Takeaways
Local Reconstruction Codes (LRCs) divide k data shards into local groups with one local parity per group, plus global parity shards; single shard failures reconstruct from local group only
LRC(12,2,2) with 2 local groups of 6 shards repairs single failures by reading 6 shards instead of 12, cutting repair bandwidth by 50% versus Reed Solomon
Storage overhead increases slightly: LRC(12,2,2) has 33% overhead (16 total shards) versus 12+3 Reed Solomon at 25% overhead (15 total shards)
Microsoft Azure Storage uses LRC in production and reported approximately 50% reduction in repair traffic, enabling faster repairs and lower cross datacenter egress costs
Local parity provides locality: single shard repairs stay within a rack or AZ rather than pulling data across the entire cluster, reducing blast radius during failures
Choose LRC when repair bandwidth is expensive (cross AZ, cross region) or when repair frequency is high; choose Reed Solomon for pure capacity optimization with plentiful bandwidth
📌 Examples
Azure Storage LRC: 12 data shards in 2 groups of 6, with 1 local parity per group plus 2 global parities (16 total); single failure repair reads 6 shards instead of 12
Repair bandwidth comparison: 10 TB shard repair with Reed Solomon 12+3 reads 120 TB (12 shards × 10 TB); LRC(12,2,2) reads 60 TB (6 local group shards × 10 TB), saving 60 TB of traffic
Cross AZ cost: at $0.01 per GB cross AZ transfer, saving 60 TB per repair saves $600 in egress costs; with 1000 repairs per month, LRC saves $600,000 monthly
Locality example: placing each local group in one AZ means single failures repair within AZ using local bandwidth; global reconstruction only needed for multi failure or local parity loss
← Back to Erasure Coding & Durability Overview