Loading...
LLM & Generative AI Systems • Chunking Strategies & Context Window ManagementMedium⏱️ ~2 min
Chunking Strategies: Fixed vs Semantic
The Strategy Decision:
Once you decide to chunk documents, you face a critical implementation choice: how exactly do you split the text? The two dominant approaches are fixed length chunking and semantic chunking, each with measurably different performance characteristics.
Fixed Length Chunking:
This approach splits documents every N tokens regardless of content structure. For example, a 5,000 token legal contract becomes exactly 10 chunks of 500 tokens each. The implementation is trivial: tokenize the document, group into fixed size arrays, optionally add 10 to 30 percent overlap between adjacent chunks.
The advantage is speed and predictability. At ingestion throughput of 100 million log entries per day, fixed chunking processes documents in microseconds with zero parsing complexity. Token budgeting is trivial because every chunk has identical size. However, you frequently cut across semantic boundaries: a table might be split so headers land in one chunk and data rows in another, or a legal definition might be separated from the clause that references it.
Semantic Chunking:
This approach respects document structure by splitting on natural boundaries: section headers, paragraph breaks, or embedding based topic shifts. A design doc with 5 sections becomes 5 chunks of varying size (200 to 1,200 tokens). Some systems use a small language model to detect when the next paragraph shifts topics based on embedding distance.
Semantic chunking typically improves answer quality by 5 to 15 percent in evaluations because chunks are self contained and coherent. A compliance policy chunk will include both the rule and its exceptions. However, variable chunk sizes complicate context budgeting: you might plan for 20 chunks but only fit 12 because several are unusually large. Ingestion is also 50x to 100x slower due to parsing overhead.
Ingestion Performance
10 μs
FIXED LENGTH
500 μs
SEMANTIC
⚠️ Common Pitfall: Semantic chunking that produces chunks ranging from 50 to 2,000 tokens makes retrieval unpredictable. You may retrieve one massive chunk that consumes your entire budget or many tiny fragments that lack context. Hybrid strategies cap semantic chunks at a max size (for example, 800 tokens) to bound variance.
💡 Key Takeaways
✓Fixed length chunking processes documents in microseconds but cuts across semantic boundaries, potentially splitting tables, definitions, or code blocks mid concept
✓Semantic chunking improves answer quality by 5 to 15 percent by preserving logical units, but is 50x to 100x slower and produces variable chunk sizes that complicate budgeting
✓Overlap of 10 to 30 percent reduces boundary loss (missing key references at split points) but increases index size and retrieval cost proportionally
✓At billion chunk scale, a 20 percent overlap means tens of millions of extra vectors to store and search, directly impacting infrastructure cost
📌 Examples
1Google scale documentation systems often use semantic chunking with section headers, accepting 500 microsecond per document overhead for better retrieval precision
2High volume log ingestion pipelines prefer fixed 256 token chunks to process 100 million entries per day without parsing bottlenecks
Loading...