Temperature and Nucleus Sampling (Top P)
Temperature Scaling
Temperature divides the raw model scores (logits) before converting to probabilities. If token A has logit 2.0 and token B has logit 1.0, the probability ratio depends on temperature. At temperature 1.0, standard probabilities apply. Lower temperature sharpens the distribution; higher temperature flattens it.
Temperature 0.5: Divide logits by 0.5 (multiply by 2). The gap between A and B doubles. A becomes more dominant. Output becomes more deterministic and predictable.
Temperature 2.0: Divide logits by 2. The gap shrinks. Lower probability tokens get more chance. Output becomes more random and creative, but also more likely to produce nonsense.
Nucleus Sampling (Top P)
Instead of a fixed number of candidates, nucleus sampling keeps tokens until their cumulative probability reaches threshold P. If P=0.9, keep adding tokens (highest first) until they sum to 90% probability, then sample only from that set.
Why this works: The number of reasonable next tokens varies by context. After "The capital of France is" only 1-2 tokens make sense. After "I feel" dozens might work. Top P adapts automatically: tight distributions yield few candidates, flat distributions yield many.
Practical Settings
Using both together: temperature reshapes probabilities first, then top_p filters. Temperature 0.7 with top_p 0.9 gives controlled creativity: slightly sharper distribution, filtered to reasonable options.