There's an obvious objection here: token costs have plummeted since 2023. If the underlying cost keeps dropping, shouldn't flat-rate pricing get more sustainable, not less? Why introduce a custom currency at all?
Token prices for non-frontier models are still trending down, but two forces most organizations underestimate are offsetting that decline.
First, context depth is growing faster than token prices are falling. Average sequence lengths in production LLM workloads have exploded, driven by agentic inference, tool calling, and longer conversation histories. A single MCP tool call can spawn dozens of sub-calls, and subtle prompt changes can radically shift token consumption, often invisibly to the consumer. Context windows have expanded from 4K to 8K tokens in 2023 to 128K to 1M today, and filling more of that window costs more regardless of per-token rate improvements. Cheaper tokens don't help if tokens per operation are climbing faster.
Second, and more strategically, a credit currency isn't just a cost pass-through. It's a pricing surface that can map to any unit of value the business chooses: compute time, API requests, agent actions, data processed, or outcomes delivered are all equally valid. Organizations that lock pricing to raw tokens are betting tokens stay the right proxy for value indefinitely. A custom currency keeps that question open. As AI capabilities evolve and the link between model cost and business value gets less direct, that abstraction layer is exactly what preserves pricing flexibility.