LLM Tokens: The Key to Efficient AI API Usage
LLM Tokens: The Key to Efficient AI API Usage

LLM Tokens 101: Let’s talk about The ‘Hidden Cost Behind AI API Calls’

1. How do LLM companies bill you 

The LLM companies bill you based on how many tokens you have used. A token is a small unit of text that a language model processes. It could be as short as one character or as long as a whole word, depending on the language and tokenisation system.

  • For English, a token is often roughly 4 characters or about ¾ of a word.
  • For other languages (like Korean, Chinese), a token may represent a part of a word, an entire word, or a character.

2. How tokens are used

  • Counting usage: Every character, punctuation mark, or word you send to or receive from the API is counted as tokens. Both your input (your prompt/question) and the model’s output (the response) are counted.
  • Billing: Most AI API providers charge based on how many tokens are processed. Common pricing is “per 1,000 tokens.”
  • Limits: There are often maximum token limits per request (for example, 4,096 or 8,192 tokens), affecting how much text you can process at once.

As an example, suppose you made a prompt and received the response as below: :

  • Prompt: “How do companies measure AI API usage in tokens?”
  • Response: “Companies use tokens to track and bill API usage. Each token represents a short part of the text.”

If you count the tokens (using the given model’s tokeniser), the prompt might be around 10 tokens in English while the response might be ~15 tokens. 

In this situation, you will be charged with approx. 25 tokens: 

  •  Input (10) + Output (15) = 25 tokens.

If the cost is $0.002 per 1,000 tokens, you would be charged accordingly.

3. How to optimise your use of tokens

As a user, you want to reduce the amounts you will be charged. Below 6 points are the very simple steps you could leverage to reduce the token usage while ensuring the same output quality . 

1. Keep Prompts Concise

  • Remove redundant or irrelevant words.
  • Focus only on the necessary context and instructions.

2. Use Clear Instructions

  • Give precise directions to the model to avoid back-and-forth clarifications, which can increase token usage.

3. Limit Output Length

  • Use settings like max_tokens or similar parameters to set a limit on response length.
  • Ask for summaries or concise outputs when possible.

4. Reuse Context Effectively

  • In chat/completion models, avoid repeating the same context or information in every request unless required.

5. Batch Processing

  • When appropriate, combine multiple related queries into one request instead of making several separate requests.

6. Use Efficient Formatting

  • Remove superfluous whitespace, line breaks, or formatting that does not affect meaning.

With this post, I hope that you get an idea of what a token is, and how they are used to gauge your use of LLM via API. Next time, I will come back with a post that compares each notable LLM market player, such as Open AI, Google, so that you can actually know what you get before paying for their services. 

Until then, Cheers, 

Matt. 

About the author

seolbishop

View all posts

Leave a Reply

Your email address will not be published. Required fields are marked *