Caching • Bloom FiltersMedium⏱️ ~3 min
Bloom Filter Optimizations: Double Hashing and Blocked Filters
Production Bloom filter implementations employ two critical engineering optimizations to reduce CPU cost and improve cache locality: double hashing and blocked (or partitioned) filters. Standard Bloom filter theory suggests using k independent hash functions, but computing k separate cryptographic or even non-cryptographic hashes per operation is expensive. Double hashing derives k bit positions from just two base hash values: compute h1(x) and h2(x) once, then generate index i as (h1 plus i times h2) modulo m for i from 0 to k minus 1. This technique cuts hashing CPU cost dramatically while mathematical analysis shows it maintains near optimal false positive rates for practical filter sizes. Major systems like Cassandra and RocksDB use double hashing with fast non-cryptographic hash functions like Murmur to achieve high throughput.
Blocked Bloom filters address CPU cache locality. A standard Bloom filter with k equals 7 and a large bit array (hundreds of megabytes) will scatter memory accesses across cache lines, causing cache misses on every lookup. Blocked Bloom filters partition the m bit array into small blocks, typically sized to fit in a single CPU cache line (64 bytes equals 512 bits). When inserting or checking an element, you first hash to select one block, then apply k hash functions only within that block's 512 bits. All k bit accesses hit the same cache line, loaded once. This improves CPU cache hit rates and enables Single Instruction Multiple Data (SIMD) vectorization. In high throughput storage engines like RocksDB handling tens of thousands of queries per second per core, blocked Bloom filters can deliver 1.5 to 2 times higher read throughput compared to standard Bloom filters with equivalent false positive rates, primarily by reducing memory stalls.
Partitioned Bloom filters are a related optimization: divide the m bit array into k equal slices and set exactly one bit per slice using k hash functions. This ensures each element sets k bits across k different partitions, bounding overlap and simplifying implementation. Partitioned filters often achieve similar false positive rates to classic Bloom filters with comparable space and offer easier vectorization. The choice between blocked and partitioned designs depends on workload: blocked filters excel in read heavy scenarios with tight CPU cache, while partitioned filters simplify reasoning about bit distribution and are easier to implement correctly in parallel.
💡 Key Takeaways
•Double hashing computes only two base hash values h1(x) and h2(x), then derives k indices as (h1 plus i times h2) modulo m for i from 0 to k minus 1. This reduces CPU cost by approximately k divided by 2 while maintaining near optimal false positive rates, and is widely used in Cassandra, RocksDB, and other production systems.
•Blocked Bloom filters partition the bit array into cache line sized blocks (typically 64 bytes equals 512 bits) and perform all k hash operations within the selected block. This keeps all memory accesses within one cache line, improving CPU cache hit rates and enabling SIMD vectorization for higher throughput.
•In high query per second (QPS) storage engines, blocked Bloom filters can deliver 1.5 to 2 times higher read throughput compared to standard Bloom filters at equivalent false positive rates. The gain comes from reducing CPU memory stalls caused by scattered cache line accesses across large bit arrays.
•Partitioned Bloom filters divide the m bit array into k equal slices and set one bit per slice, ensuring predictable bit distribution. This design simplifies parallel implementation and reasoning about false positive rate, and often matches classic Bloom filter space efficiency.
•Use fast non-cryptographic hash functions like Murmur3 or xxHash with independent seeds for h1 and h2. Cryptographic hashes like Secure Hash Algorithm (SHA) are unnecessary and waste CPU. Well distributed non-cryptographic hashes are sufficient and orders of magnitude faster, critical for filters checked on every read path operation.
•Validate hash distribution with entropy tests during development. Poor or correlated hash functions create hot bits (bits set much more frequently than others), inflating false positive rate beyond theoretical predictions. Independent hash seeds and testing with real workload data prevent this failure mode.
📌 Examples
RocksDB uses blocked Bloom filters by default with 10 bits per key. In a read workload with 100,000 queries per second and 70% negative lookups, blocked filters improve throughput from approximately 80,000 to 120,000 queries per second per core by keeping all bit accesses in one cache line, reducing memory stalls.
Cassandra implements double hashing with Murmur3 hash to generate k equals 7 bit positions from two base hashes. This cuts hash computation cost by approximately 70% compared to computing 7 independent hashes, while maintaining the target 1% false positive rate for typical SSTable sizes of 100 million to 1 billion keys.
A high throughput key value store using partitioned Bloom filters divides a 1 GB bit array into 7 partitions (one per hash function). Each insert sets exactly 7 bits, one per partition. This design allows parallel filter checks across partitions using SIMD instructions, achieving sub-microsecond lookup times even for billion key datasets.