I agree, this blog makes an assumption that you use the machine with 80 to 85% of its GPU.
For small number of request it's always better to use alternatives like GPT 3.5 or PaLM model which is cheaper in cost.
But if you have a use case where you want to utilise lot more tokens then this is a best alternative. (Use case like processing news articles or document context, usually it's very high in number)
And on your point on vendor dependency - I should be usually easy to replicate and test prompts between models except if you are utilising something like Openai functions which makes it more difficult to migrate to other vendor.