Decoding Failure Modes and Safety Controls
Repetition Loops
The model generates "I think that I think that I think that..." endlessly. This happens because once a phrase appears, its probability increases for the next position. Beam search amplifies this: the repetitive sequence accumulates high probability and dominates all beams.
Fix: Repetition penalty. Reduce the probability of tokens that already appeared in the output. A penalty of 1.2 means previously-seen tokens get their logits divided by 1.2. Too high (2.0+) causes the model to avoid legitimate repetition like pronouns.
Length Degeneration
Beam search favors shorter sequences because probability accumulates multiplicatively. A 10-token sequence with 0.9 per-token probability scores higher (0.9^10 = 0.35) than a 20-token sequence with same per-token probability (0.9^20 = 0.12).
Fix: Length normalization. Divide final score by sequence length or length raised to a power (alpha). Alpha of 0.6-0.8 balances brevity preference against completion. Without this, the model outputs terse, incomplete responses.
Sampling Collapse
High temperature plus high top_p occasionally samples an extremely low probability token. Once one bad token enters the sequence, the model has no good continuations, and output quality collapses into nonsense.
Safety Controls
Output filtering: Run generated text through a classifier before returning. Block responses containing harmful content, personally identifiable information, or policy violations. Latency cost: 10-50ms per response.
Logit bias: Increase or decrease probability of specific tokens during generation. Set certain tokens to negative infinity to make them impossible to generate. Used for preventing profanity, brand names, or competitor mentions.