Caching • Bloom FiltersMedium⏱️ ~3 min

Bloom Filter Optimizations: Double Hashing and Blocked Filters

Double Hashing Optimization
Production Bloom filter implementations employ two critical engineering optimizations to reduce CPU cost and improve cache locality: double hashing and blocked (or partitioned) filters. Standard Bloom filter theory suggests using k independent hash functions, but computing k separate hashes per operation is expensive. Double hashing derives k bit positions from just two base hash values: compute h1(x) and h2(x) once, then generate index i as (h1 + i * h2) mod m for i from 0 to k minus 1. This technique cuts hashing CPU cost by approximately k/2 while maintaining near optimal false positive rates for practical filter sizes. Use fast non-cryptographic hash functions like Murmur3 or xxHash; cryptographic hashes are unnecessary and waste CPU.
Blocked Bloom Filters
Blocked Bloom filters address CPU cache locality. A standard Bloom filter with k equals 7 and a large bit array (hundreds of megabytes) scatters memory accesses across cache lines, causing cache misses on every lookup. Blocked filters partition the m bit array into small blocks, typically sized to fit in a single CPU cache line (64 bytes = 512 bits). When inserting or checking an element, you first hash to select one block, then apply k hash functions only within that block. All k bit accesses hit the same cache line, loaded once. This improves CPU cache hit rates and enables SIMD (Single Instruction Multiple Data) vectorization. In high throughput storage engines handling tens of thousands of queries per second per core, blocked Bloom filters can deliver 1.5 to 2x higher read throughput compared to standard Bloom filters with equivalent false positive rates.
Partitioned Bloom Filters
Partitioned Bloom filters divide the m bit array into k equal slices and set exactly one bit per slice using k hash functions. This ensures each element sets k bits across k different partitions, bounding overlap and simplifying implementation. Partitioned filters often achieve similar false positive rates to classic Bloom filters with comparable space and offer easier vectorization. The choice between blocked and partitioned designs depends on workload: blocked filters excel in read heavy scenarios with tight CPU cache, while partitioned filters simplify reasoning about bit distribution and are easier to implement correctly in parallel.

💡 Key Takeaways

✓Double hashing computes only two base hashes h1(x) and h2(x), then derives k indices as (h1 + i*h2) mod m. Reduces CPU cost by approximately k/2 while maintaining near optimal false positive rates.

✓Blocked Bloom filters partition the bit array into cache line sized blocks (64 bytes = 512 bits) and perform all k hash operations within one block, improving CPU cache hit rates.

✓In high QPS storage engines, blocked filters deliver 1.5 to 2x higher read throughput at equivalent false positive rates by reducing memory stalls.

✓Partitioned filters divide array into k slices, set one bit per slice. Simplifies parallel implementation and reasoning about false positive rate.

✓Use fast non-cryptographic hash functions like Murmur3 or xxHash with independent seeds. Cryptographic hashes waste CPU for no benefit.

📌 Interview Tips

1In read workloads with 100,000 QPS and 70% negative lookups, mention that blocked filters improve throughput from approximately 80,000 to 120,000 QPS per core by keeping all bit accesses in one cache line.

2When discussing implementation, note that double hashing with Murmur3 cuts hash computation cost by approximately 70% compared to computing 7 independent hashes.

3Explain the trade off: blocked filters excel for read heavy workloads with tight CPU cache; partitioned filters are better when you need parallel construction or simpler implementation.

← Back to Bloom Filters Overview