What is a Trie and Why Autocomplete Uses It
The Filing Cabinet Analogy
Imagine organizing a phone book of 10 million entries into nested folders. The first folder is labeled "A", containing subfolders "AA", "AB", "AC" and so on. Inside "AM" you find "AMA", "AME", "AMO". To find "Amazon", you open A, then M, then A, then Z, then O, then N, touching exactly 6 folders. You never touch folders for "B" through "Z" because they are irrelevant to your prefix. This is why autocomplete feels instant: as you type each character, the system narrows possibilities by traversing exactly one edge per keystroke.
Why Tries Win for Autocomplete
When a user types "ama", the trie walks exactly 3 edges to reach the "ama" node, regardless of whether the dictionary contains 100,000 or 50 million words. From that node, all words starting with "ama" (amazon, amaze, amateur, amalgamate) are descendants reachable by continuing traversal. The lookup is O(k) where k is prefix length, typically 3 to 10 characters. With average edge traversal taking 20 to 50 nanoseconds (a pointer dereference plus cache lookup), a 6 character prefix resolves in 120 to 300 nanoseconds.
Contrast with Brute Force Scanning
Without a trie, autocomplete requires scanning the entire dictionary: does apple start with ama? No. Does amazon start with ama? Yes. Does amaze start with ama? Yes. Continue for all 50 million words. With 50 million words averaging 8 characters, that is 400 million character comparisons per query, taking 50 to 100 ms on modern hardware. With a trie, the same query takes 0.1 ms, a 500 to 1000x improvement that makes real time keystroke response possible.
What Each Node Stores
At minimum: child pointers (one per possible next character, typically 26 for lowercase English or 256 for ASCII) and a boolean flag indicating "a word ends here." The node for "a-m-a-z-o-n" has its word end flag set; the node for "a-m-a" does not (unless "ama" itself is a valid word). Production systems store additional metadata at word ending nodes: search frequency (how often users search for this term), recency weighting, category tags, and often precomputed top K completions to avoid traversing descendants at query time.