2# Interview Questions on Large Language Models (LLMs)
❓ Question — What are your strategies to calculate the cost of running LLMs?
Answer:
The cost of running LLMs can be divided into 2 parts:
🌟 𝗣𝗿𝗶𝘃𝗮𝘁𝗲 𝗟𝗟𝗠𝘀 (𝗟𝗶𝗸𝗲 𝗚𝗣𝗧 𝟯.𝟱 𝗼𝗿 𝟰 𝗠𝗼𝗱𝗲𝗹𝘀):
Private LLMs usually calculate cost by counting either the 𝗻𝘂𝗺𝗯𝗲𝗿 𝗼𝗳 𝘁𝗼𝗸𝗲𝗻𝘀 (GPT 3.5 or 4) or using the 𝗻𝘂𝗺𝗯𝗲𝗿 𝗼𝗳 𝗰𝗵𝗮𝗿𝗮𝗰𝘁𝗲𝗿𝘀 (PaLM). You can divide the cost into 2 parts:
📝 𝙋𝙧𝙤𝙢𝙥𝙩 𝙤𝙧 𝙞𝙣𝙥𝙪𝙩 𝙩𝙤𝙠𝙚𝙣𝙨 𝙤𝙧 𝙘𝙝𝙖𝙧𝙖𝙘𝙩𝙚𝙧𝙨
📤 𝘾𝙤𝙢𝙥𝙡𝙚𝙩𝙞𝙤𝙣 𝙤𝙧 𝙤𝙪𝙩𝙥𝙪𝙩 𝙩𝙤𝙠𝙚𝙣𝙨 𝙤𝙧 𝙘𝙝𝙖𝙧𝙖𝙘𝙩𝙚𝙧𝙨
A new course launched for interview preparation
We have launched a new course “Interview Questions and Answers on Large Language Models (LLMs)” series.
This program is designed to bridge the job gap in the global AI industry. It includes 100+ questions and answers from top companies like FAANG and Fortune 500 & 100+ self-assessment questions.
The course offers regular updates, self-assessment questions, community support, and a comprehensive curriculum covering everything from Prompt Engineering and basics of LLM to Supervised Fine-Tuning (SFT) LLM, Deployment, Hallucination, Evaluation, and Agents etc.
Detailed curriculum (Get 50% off using coupon code MED50 for first 10 users)
Free self assessment on LLM (30 MCQs in 30 mins)
𝗦𝘁𝗿𝗮𝘁𝗲𝗴𝘆:
- Prompt tokens are usually easy to calculate. In the case of GPT 3.5 or 4, you can use the 𝘁𝗶𝗸𝘁𝗼𝗸𝗲𝗻 library to accurately find the number of tokens. Find a detailed notebook to calculate the number of tokens for different OpenAI models.
2. Since output tokens depend on the specific task, there are several strategies to approximately calculate:
a. Take a 𝘀𝘁𝗮𝘁𝗶𝘀𝘁𝗶𝗰𝗮𝗹𝗹𝘆 𝘀𝗶𝗴𝗻𝗶𝗳𝗶𝗰𝗮𝗻𝘁 𝘀𝗮𝗺𝗽𝗹𝗲 𝗮𝗻𝗱 𝗰𝗮𝗹𝗰𝘂𝗹𝗮𝘁𝗲 𝗼𝘂𝘁𝗽𝘂𝘁 𝘁𝗼𝗸𝗲𝗻𝘀 to find the average number of output tokens.
b. Limit 𝗺𝗮𝘅 𝗼𝘂𝘁𝗽𝘂𝘁 𝘁𝗼𝗸𝗲𝗻𝘀 in the API response.
c. Try to 𝗿𝗲𝘀𝘁𝗿𝗶𝗰𝘁 𝘁𝗵𝗲 𝗼𝘂𝘁𝗽𝘂𝘁 𝘁𝗼𝗸𝗲𝗻𝘀 instead of free text, for example, output tokens can be restricted to give specific JSON format key and value pairs.
🚀 𝗢𝗽𝗲𝗻 𝗦𝗼𝘂𝗿𝗰𝗲 𝗟𝗟𝗠𝘀:
Open source LLM cost can be calculated using:
If open source is available for commercial use without any restrictions,
a. You can 𝗰𝗿𝗲𝗮𝘁𝗲 𝗮 𝗯𝗲𝗻𝗰𝗵𝗺𝗮𝗿𝗸 by running parallel requests on a GPU machine.
b. Calculate the number of tokens and time utilized by the process.
c. This will help us understand X tokens/Min.
d. You can then calculate how much time is required to process all your tokens.
e. You can find the cost of running that instance on the cloud.
f. You can find a detailed example of running Mistral AI vs GPT 4 cost in this detailed article.
If the open-source model has a restricted commercial license, you might want to consider revenue generated by generating output. This can give us an approximate cost of running an LLM.
💬 #Comment below, 𝘄𝗵𝗮𝘁 𝗶𝘀 𝘆𝗼𝘂𝗿 𝘀𝘁𝗿𝗮𝘁𝗲𝗴𝘆 𝗳𝗼𝗿 𝗰𝗮𝗹𝗰𝘂𝗹𝗮𝘁𝗶𝗻𝗴 𝘁𝗵𝗲 𝗰𝗼𝘀𝘁 𝗼𝗳 𝗿𝘂𝗻𝗻𝗶𝗻𝗴 𝘀𝘂𝗰𝗵 𝗺𝗼𝗱𝗲𝗹𝘀?
Your feedback as comments and claps encourages us to create better content for the community.