How to Select Right LLM model for your use case

4 min readSep 7, 2024

When you begin any client project, one of the most frequently asked questions is, “Which model should I use?” There isn’t a straightforward answer to this; it’s a process. In this blog, we’ll explain that process so that next time your client asks you this question, you can share this document with them. 😁

Choosing the right model, whether GPT4 Turbo, Gemini Pro , Gemini Flash GPT-4o or a smaller option like GPT-4o-mini, requires balancing accuracy, latency, and cost.

Core Principles

The principles for model selection are simple:

Optimize for accuracy first: Optimize for accuracy until you hit your accuracy target.
Optimize for cost and latency second: Then aim to maintain accuracy with the cheapest, fastest model possible.

Are you preparing for Gen AI interview ? Look for our LLM Interview preparation Course

100+ Questions spanning 14 categories & Real Case Studies
Curated 100+ assessments for each category
Well-researched real-world interview questions based on FAANG & Fortune 500 companies
Focus on Visual learning
Certificate of completion

50% off Coupon Code — LLM50

Link for the course :

Large Language Model (LLM) Interview Question And Answer Course

Dive deep into the world of AI with this comprehensive large language model (LLM) interview questions & answer course…

www.masteringllm.com

Focus on Accuracy First

Set a Clear Accuracy Goal : Define what accuracy is “good enough” for your use case. Example: 90% of customer service calls triaged correctly on the first interaction
Develop an Evaluation Dataset : Create a dataset to measure the model’s performance. Example: Capture 100 interaction examples, including user requests, model triage, correct triage, and accuracy
Use the Most Powerful Model: Start with the most capable model to achieve your accuracy targets. Log responses for future use.
Optimize for Accuracy: Use retrieval-augmented generation & then Fine-tune for consistency and behavior
Collect Data for Future Use: Gather prompt and completion pairs for evaluations, few-shot learning, or fine-tuning. This practice, known as prompt baking, helps produce high-quality examples for future use.

Optimize cost and latency

Cost and latency are considered secondary because if the model can’t hit your accuracy target then these concerns are moot. However, once you’ve got a model that works for your use case, you can take one of two approaches:

Compare with a smaller model zero- or few-shot: Swap out the model for a smaller, cheaper one and test whether it maintains accuracy at the lower cost and latency point.
Model distillation: Fine-tune a smaller model using the data gathered during accuracy optimization.
Cost and latency are typically interconnected; reducing tokens and requests generally leads to faster processing.

The main strategies to consider here are:

Reduce requests: Limit the number of necessary requests to complete tasks.
Minimize tokens: Lower the number of input tokens and optimize for shorter model outputs.
Select a smaller model: Use models that balance reduced costs and latency with maintained accuracy.

Practical example from open AI

To demonstrate these principles, they have develop a fake news classifier with the following target metrics:

Accuracy: Achieve 90% correct classification
Cost: Spend less than $5 per 1,000 articles
Latency: Maintain processing time under 2 seconds per article

Experiments

They ran three experiments to reach goal:

Zero-shot: Used GPT-4o with a basic prompt for 1,000 records, but missed the accuracy target.
Few-shot learning: Included 5 few-shot examples, meeting the accuracy target but exceeding cost due to more prompt tokens.
Fine-tuned model: Fine-tuned GPT-4o-mini with 1,000 labeled examples, meeting all targets with similar latency and accuracy but significantly lower costs.

Conclusion

Optimize for accuracy first & followed by Optimization for cost and latency.
This process is important — you often can’t jump right to fine-tuning because you don’t know whether fine-tuning is the right tool for the optimization you need, or you don’t have enough labeled examples.
Use a large accurate model to achieve your accuracy targets, and curate a good training set — then go for a smaller, more efficient model with fine-tuning.

Credit to Open AI team. Original content.