Zero Shot vs Supervised Classification Trade-offs
How Zero Shot Works
The model encodes both your input text and each label description into numerical vectors (embeddings). For input "My card was charged twice," it creates a 768-dimensional vector. It does the same for candidate labels: "billing issue," "delivery problem." The label with highest similarity wins.
Why does this work? Large language models develop internal representations of meaning during pretraining on billions of words. "Charged twice" and "billing issue" end up close in vector space because the model learned these concepts relate semantically.
The Supervised Approach
Supervised classification takes a different path. You provide 1,000+ labeled examples: text paired with the correct category. The model learns patterns specific to your data: "charged twice" strongly predicts "billing," while "took 3 weeks" predicts "shipping." After training, you have a specialized model that knows only your labels but knows them extremely well.
Trade-off Summary
At 1M requests/day, zero shot at 200ms costs 55 GPU-hours. Fine-tuned at 10ms costs 2.8 GPU-hours: 20x cheaper. But when adding a new category, zero shot handles it instantly while supervised requires relabeling and retraining.