Natural Language Processing SystemsText Classification at ScaleEasy⏱️ ~2 min

Zero Shot vs Supervised Classification Trade-offs

Core Concept
Zero shot classification uses a pretrained language model to assign labels without task-specific training. Supervised classification requires labeled training data and builds a model specifically for your labels.

How Zero Shot Works

The model encodes both your input text and each label description into numerical vectors (embeddings). For input "My card was charged twice," it creates a 768-dimensional vector. It does the same for candidate labels: "billing issue," "delivery problem." The label with highest similarity wins.

Why does this work? Large language models develop internal representations of meaning during pretraining on billions of words. "Charged twice" and "billing issue" end up close in vector space because the model learned these concepts relate semantically.

The Supervised Approach

Supervised classification takes a different path. You provide 1,000+ labeled examples: text paired with the correct category. The model learns patterns specific to your data: "charged twice" strongly predicts "billing," while "took 3 weeks" predicts "shipping." After training, you have a specialized model that knows only your labels but knows them extremely well.

Trade-off Summary

⚠️ Zero Shot: No training data needed, add labels instantly. Accuracy 70-85%. Latency 100-300ms because you process all labels each request.
💡 Supervised: Needs 500-2000 labeled examples per class. Accuracy 90-95%. Latency 5-20ms after training.

At 1M requests/day, zero shot at 200ms costs 55 GPU-hours. Fine-tuned at 10ms costs 2.8 GPU-hours: 20x cheaper. But when adding a new category, zero shot handles it instantly while supervised requires relabeling and retraining.

💡 Key Takeaways
Zero shot uses pretrained model similarity between text and labels, no training data required
Supervised needs 500-2000 labeled examples per class but achieves 90-95% accuracy vs 70-85%
Zero shot latency is 100-300ms, supervised is 5-20ms after training
At scale, supervised is 10-20x cheaper but zero shot allows instant label changes
Choose zero shot for prototyping; supervised for stable high-volume production
📌 Interview Tips
1Explain accuracy vs flexibility: supervised gets 90%+ but requires labeled data collection
2Zero shot is ideal for POCs: test with zero training data and add labels instantly
3At 1M requests/day, 200ms vs 10ms inference is 20x compute difference
← Back to Text Classification at Scale Overview
Zero Shot vs Supervised Classification Trade-offs | Text Classification at Scale - System Overflow