I agree, this blog makes an assumption that you use the machine with 80 to 85% of its GPU.

Nov 1, 2023

I agree, this blog makes an assumption that you use the machine with 80 to 85% of its GPU.

For small number of request it's always better to use alternatives like GPT 3.5 or PaLM model which is cheaper in cost.

But if you have a use case where you want to utilise lot more tokens then this is a best alternative. (Use case like processing news articles or document context, usually it's very high in number)

And on your point on vendor dependency - I should be usually easy to replicate and test prompts between models except if you are utilising something like Openai functions which makes it more difficult to migrate to other vendor.

Written by Mastering LLM (Large Language Model)

No responses yet