The Catch Behind Claude’s Cheaper AI Tokens

Key Takeaways

  • AI models like OpenAI’s ChatGPT and Anthropic’s Claude use different “tokenizers” to break down text.
  • Anthropic’s Claude models often create more tokens from the same text compared to OpenAI’s GPT models.
  • This difference means Claude can be 20-30% more expensive in practice, despite potentially lower advertised per-token rates for input.
  • The number of extra tokens Claude generates varies depending on the type of content (e.g., more for code than for plain English).
  • This token difference also affects how much information fits into the models’ “context window,” potentially making Claude’s larger window less spacious in effect.

When comparing leading AI models like OpenAI’s ChatGPT family and Anthropic’s Claude family, it’s easy to focus on the advertised price per token. But there’s a hidden factor: how they actually count those tokens.

Different AI models use different systems, called tokenizers, to chop up text into pieces they can understand. It turns out these systems don’t always count the same way.

Recent analysis highlighted by VentureBeat reveals that Anthropic’s Claude models tend to create significantly more tokens from the exact same piece of text compared to OpenAI’s GPT models, like GPT-4o.

As of June 2024, Claude 3.5 Sonnet boasted a 40% lower cost for input tokens than GPT-4o, making it look like a bargain. However, because its tokenizer often generates more tokens, the total cost for processing the same information can end up being higher.

Experiments showed that running identical tasks could cost 20-30% more using Claude compared to GPT-4o, primarily because Claude needed more tokens to represent the same prompts.

This “tokenizer inefficiency” isn’t uniform. The difference is more noticeable with certain types of text. For standard English articles, Claude might generate about 16% more tokens than GPT-4o.

But for more structured content, like mathematical formulas or Python code, the gap widens considerably. Claude’s tokenizer produced 21% more tokens for math and a significant 30% more for code compared to GPT-4o in tests.

This happens because Anthropic’s system seems to break down technical symbols and code structures into smaller, more numerous pieces (tokens).

Beyond direct costs, this token difference has implications for the models’ “context window” – the amount of information they can consider at once. Anthropic advertises a large 200,000-token window for Claude, versus OpenAI’s 128,000 for GPT-4o.

However, if Claude uses more tokens for the same information, its effectively usable window might not be as comparatively large as the numbers suggest.

OpenAI’s GPT models use a well-known method called Byte Pair Encoding (BPE), and their specific tokenizer (o200k_base for GPT-4o) is openly available. Information about Anthropic’s tokenizer is less public, though some reports suggest it uses a vocabulary with fewer variations than OpenAI’s.

For businesses using these AI models extensively, understanding these tokenization differences is vital. The lower per-token price offered by Anthropic might be offset by the higher token counts, especially when working with technical documents or code.

Therefore, evaluating the true cost requires looking beyond the sticker price and considering how each model actually processes the specific type of text relevant to your needs.

Independent, No Ads, Supported by Readers

Enjoying ad-free AI news, tools, and use cases?

Buy Me A Coffee

Support me with a coffee for just $5!

 

More from this stream

Recomended