OpenAI Token Counting
OpenAI’s GPT models – from GPT-4o Mini through GPT-5.4 – all use tokenizers from the tiktoken library. If you’ve worked with OpenAI’s API, you know that every request you send gets billed by token count. Understanding how those tokens are calculated helps you predict costs and stay within context limits.
GPT models use a BPE (Byte Pair Encoding) tokenizer. In practice, this means English text averages about 4 characters per token. But that’s just an average – common words like “the” or “is” are single tokens, while technical jargon or uncommon words might be split across three or four tokens. Code tends to tokenize less efficiently than prose, especially when it’s full of variable names and special syntax.
GPT Model Pricing and Limits
| Model | Context | Max Output | Input $/1M | Output $/1M |
|---|---|---|---|---|
| GPT-5.4 | 256K | 32K | $10.00 | $30.00 |
| GPT-4o | 128K | 16K | $2.50 | $10.00 |
| GPT-4o Mini | 128K | 16K | $0.15 | $0.60 |
GPT-4o Mini is an excellent pick for high-volume tasks where you don’t need the full reasoning power of GPT-5.4. At $0.15 per million input tokens, you can process enormous volumes of text for pennies.
Optimizing Token Usage with OpenAI
A few things that help cut token costs with GPT models:
- Use prompt caching. OpenAI caches identical prompt prefixes, so repeated system instructions don’t get re-billed at full price.
- Pick the smallest model that works. GPT-4o Mini handles classification, extraction, and simple generation surprisingly well.
- Keep your system prompt tight. Every token in your system prompt gets charged on every request. Shaving 500 tokens off a system prompt saves real money at scale.
- Use structured outputs. Requesting JSON mode with a schema often produces shorter, more predictable responses than free-form text.
For exact token counts in production, use OpenAI’s tiktoken Python library or the tokenizer endpoint in their API. This tool gives you fast estimates for planning and cost comparison.