Partitioning & Sharding • Secondary Indexes with PartitioningHard⏱️ ~3 min
How Do You Implement and Maintain Global Secondary Indexes at Scale?
Implementing global secondary indexes at scale requires a robust asynchronous pipeline that propagates base writes to index partitions without distributed transactions. The standard pattern uses a durable change log or stream. When a base write commits, the system appends an entry to a partition level commit log or change stream (similar to database transaction logs or Kafka). An indexer service consumes this stream, transforms each change into index upserts or deletes, and writes to the appropriate index partition determined by hashing or range partitioning the indexed term. Each index entry includes the term value, the primary key, optional projected attributes, and a monotonic sequence number or timestamp for last writer wins conflict resolution. Index writes are idempotent, allowing retries without duplication. DynamoDB GSIs implement this pattern with internal change streams and background indexer tasks per table.
Capacity planning must account for write amplification and replication traffic. With K global indexes, each base write that modifies indexed attributes generates 1 plus K write operations. At 50,000 base writes per second, 3 GSIs, and assuming 40 percent of writes modify indexed attributes, the index subsystem performs approximately 0.4 times 50,000 times 3 equals 60,000 index writes per second. Replication amplifies further in multi region setups: replicating base data plus K indexes to R regions generates (1 plus K) times R replication streams. Tail latency with covering indexes is typically 10 to 30 milliseconds at p99 when the query hits a single healthy index partition, but can spike to 100 plus milliseconds during partition splits, backfills, or hotspots, so provision capacity headroom of 20 to 50 percent above steady state load.
Observability is critical for operating indexes at scale. Expose metrics for index lag (time or sequence number delta between base writes and index updates), per index write throughput and error rates, and hot key detection (track top N terms by request rate). Alert when lag exceeds query freshness SLOs, for example, if lag surpasses 1 second for more than 5 minutes. Implement read repair: when a query fetches a base record via an index entry, verify the record still matches the indexed predicate, and if not, asynchronously delete or update the stale index entry. For backfills, use a separate throttled scanner that reads the base table and upserts index entries, running in parallel with the live change stream but at lower priority to avoid impacting online traffic. DynamoDB backfills new GSIs at a rate proportional to provisioned write capacity, typically 3,000 to 5,000 items per second per 1,000 write capacity units, and surfaces backfill progress via CloudWatch metrics.
💡 Key Takeaways
•Asynchronous index propagation via durable change streams avoids distributed transactions but introduces eventual consistency. DynamoDB GSI lag is typically 100 to 500 milliseconds, extending to seconds under load spikes or during backfill.
•Write amplification with K indexes is approximately 1 plus 0.4K for workloads where 40 percent of writes modify indexed attributes. At 100,000 writes per second and 5 GSIs, expect roughly 300,000 write operations per second total, multiplying capacity cost significantly.
•Index entries must include sequence numbers or timestamps for last writer wins semantics. Without versioning, concurrent updates can cause index entries to reflect stale base record states, requiring periodic reconciliation or read repair.
•Backfilling a new index on a multi terabyte table can take hours and consumes write capacity proportional to the scan rate. DynamoDB backfills at roughly 3,000 to 5,000 items per second per 1,000 provisioned write capacity units. During backfill, queries return incomplete results.
•Hot key detection and mitigation is essential. Monitor top N terms by request rate; when a term exceeds a threshold (e.g., 1,000 QPS to a single index partition), apply salting (partition by hash(term) mod B, term, primary key) and fan queries to B buckets.
•Read repair maintains index correctness over time. When fetching a base record via index, verify the record matches the indexed predicate; if not, enqueue an asynchronous index update or delete to fix the inconsistency, preventing unbounded divergence.
📌 Examples
DynamoDB GSI implementation: Base table writes commit and append to an internal change stream. Indexer tasks consume the stream, hash the indexed attribute (e.g., status) to determine the target GSI partition, and write index entries with a sequence number. Queries on status route to the hashed GSI partition, fetch matching primary keys, then batch fetch base records. Index lag is monitored via CloudWatch; teams alert if lag exceeds 2 seconds.
Adding a GSI to a 3 TB DynamoDB table with 10,000 provisioned write capacity units: Backfill scans the base table at roughly 4,000 items per second, taking 5 hours. During backfill, queries on the new GSI return partial results. Once complete, the GSI switches to live change stream consumption, and lag drops to sub second.
Hot key mitigation: A ride sharing service indexes trips by city. The city equals San Francisco index partition receives 20,000 QPS, throttling at 5,000 QPS capacity. Engineers salt city into 8 buckets: index key equals hash(San Francisco) mod 8, San Francisco, trip_id. Queries fan to 8 partitions at 2,500 QPS each, eliminating throttling. Query latency increases from 10 milliseconds to 18 milliseconds due to fan out.
Read repair: A query fetches order_id equals 999 from the status equals pending GSI, then retrieves the base record, which shows status equals completed. The system enqueues an asynchronous task to delete the stale index entry. Over time, read repair corrects divergence caused by write races or lost index updates.