Bloom Filter Optimizations: Double Hashing and Blocked Filters
Double Hashing Optimization
Production Bloom filter implementations employ two critical engineering optimizations to reduce CPU cost and improve cache locality: double hashing and blocked (or partitioned) filters. Standard Bloom filter theory suggests using k independent hash functions, but computing k separate hashes per operation is expensive. Double hashing derives k bit positions from just two base hash values: compute h1(x) and h2(x) once, then generate index i as (h1 + i * h2) mod m for i from 0 to k minus 1. This technique cuts hashing CPU cost by approximately k/2 while maintaining near optimal false positive rates for practical filter sizes. Use fast non-cryptographic hash functions like Murmur3 or xxHash; cryptographic hashes are unnecessary and waste CPU.
Blocked Bloom Filters
Blocked Bloom filters address CPU cache locality. A standard Bloom filter with k equals 7 and a large bit array (hundreds of megabytes) scatters memory accesses across cache lines, causing cache misses on every lookup. Blocked filters partition the m bit array into small blocks, typically sized to fit in a single CPU cache line (64 bytes = 512 bits). When inserting or checking an element, you first hash to select one block, then apply k hash functions only within that block. All k bit accesses hit the same cache line, loaded once. This improves CPU cache hit rates and enables SIMD (Single Instruction Multiple Data) vectorization. In high throughput storage engines handling tens of thousands of queries per second per core, blocked Bloom filters can deliver 1.5 to 2x higher read throughput compared to standard Bloom filters with equivalent false positive rates.
Partitioned Bloom Filters
Partitioned Bloom filters divide the m bit array into k equal slices and set exactly one bit per slice using k hash functions. This ensures each element sets k bits across k different partitions, bounding overlap and simplifying implementation. Partitioned filters often achieve similar false positive rates to classic Bloom filters with comparable space and offer easier vectorization. The choice between blocked and partitioned designs depends on workload: blocked filters excel in read heavy scenarios with tight CPU cache, while partitioned filters simplify reasoning about bit distribution and are easier to implement correctly in parallel.