I agree, this blog makes an assumption that you use the machine with 80 to 85% of its GPU.

For small number of request it's always better to use alternatives like GPT 3.5 or PaLM model which is cheaper in cost.

But if you have a use case where you want to utilise lot more tokens then this is a best alternative. (Use case like processing news articles or document context, usually it's very high in number)

And on your point on vendor dependency - I should be usually easy to replicate and test prompts between models except if you are utilising something like Openai functions which makes it more difficult to migrate to other vendor.

--

--

Mastering LLM (Large Language Model)
Mastering LLM (Large Language Model)

Written by Mastering LLM (Large Language Model)

MasteringLLM is a AI first EdTech company making learning LLM simplified with its visual contents. Look out for our LLM Interview Prep & AgenticRAG courses.

No responses yet