Learn→Natural Language Processing Systems→Text Generation (Beam Search, Sampling, Decoding)→2 of 6

Natural Language Processing Systems • Text Generation (Beam Search, Sampling, Decoding)Medium⏱️ ~3 min

Temperature and Nucleus Sampling (Top P)

Core Concept
Sampling introduces randomness into decoding. Instead of always picking the highest probability token, you randomly sample from the distribution. Temperature and nucleus sampling control how much randomness.
Temperature Scaling
Temperature divides the raw model scores (logits) before converting to probabilities. If token A has logit 2.0 and token B has logit 1.0, the probability ratio depends on temperature. At temperature 1.0, standard probabilities apply. Lower temperature sharpens the distribution; higher temperature flattens it.
Temperature 0.5: Divide logits by 0.5 (multiply by 2). The gap between A and B doubles. A becomes more dominant. Output becomes more deterministic and predictable.
Temperature 2.0: Divide logits by 2. The gap shrinks. Lower probability tokens get more chance. Output becomes more random and creative, but also more likely to produce nonsense.
Nucleus Sampling (Top P)
Instead of a fixed number of candidates, nucleus sampling keeps tokens until their cumulative probability reaches threshold P. If P=0.9, keep adding tokens (highest first) until they sum to 90% probability, then sample only from that set.
Why this works: The number of reasonable next tokens varies by context. After "The capital of France is" only 1-2 tokens make sense. After "I feel" dozens might work. Top P adapts automatically: tight distributions yield few candidates, flat distributions yield many.
Practical Settings
💡 Common Configurations: Factual QA: temp 0.1-0.3, top_p 0.9. Creative writing: temp 0.7-1.0, top_p 0.95. Code generation: temp 0.2, top_p 0.95. Chatbots: temp 0.7, top_p 0.9.
Using both together: temperature reshapes probabilities first, then top_p filters. Temperature 0.7 with top_p 0.9 gives controlled creativity: slightly sharper distribution, filtered to reasonable options.

💡 Key Takeaways

✓Temperature divides logits before probability conversion: lower = sharper, higher = flatter distribution

✓Temperature 0.5 makes output deterministic; temperature 2.0 makes it random and creative

✓Nucleus sampling (top_p) keeps tokens until cumulative probability reaches threshold P

✓Top_p adapts automatically: tight contexts yield few candidates, open contexts yield many

✓Common settings: factual QA uses temp 0.1-0.3, creative writing uses temp 0.7-1.0

📌 Interview Tips

1Explain temperature mechanics: lower temp sharpens distribution, higher flattens it

2Show why top_p adapts: 'capital of France is' needs few candidates, 'I feel' needs many

3Give concrete settings: code generation uses temp 0.2, creative writing uses temp 0.7-1.0

← Back to Text Generation (Beam Search, Sampling, Decoding) Overview