Loading...
LLM & Generative AI Systems • Chunking Strategies & Context Window ManagementEasy⏱️ ~2 min
What is Chunking in LLM Systems?
Definition
Chunking is the process of splitting large documents into smaller, retrievable units that can fit within an LLM's limited context window while preserving enough local context for the model to reason effectively.
💡 Key Takeaways
✓Context windows are limited: even 128k token models cannot hold an entire knowledge base, requiring selective retrieval of relevant sections
✓Chunks become the unit of retrieval: each chunk is embedded as a vector and stored in a searchable index for fast lookup at query time
✓Typical chunk sizes range from 150 to 1,000 tokens depending on context window size and how many perspectives you want to fit
✓Chunking happens offline during ingestion, while retrieval happens online within strict latency budgets of 50 to 100 ms p95
📌 Examples
1A 100 million page internal documentation system chunks each page into 4 to 8 segments, creating 400 to 800 million searchable chunks
2ChatGPT style systems chunk conversation history to keep recent messages within the context window while summarizing or dropping older turns
Loading...