Aug 28, 2024
The formula is very approximate calculation, it doesn't take into consideration model architecture, no. of users, KV cache, number of tokens etc.
You can find exact requirement in huggingface here https://huggingface.co/docs/accelerate/en/usage_guides/model_size_estimator
Unfortunately calculating exact requirement considering number of users is difficult, it may require some benchmarking on infra.