10 votes

AI tokens are getting more expensive

2 comments

  1. skybrian
    Link
    The headline is kind of misleading - they’re not saying the per-token cost is increasing. They’re saying people are using a lot more tokens. This is certainly true when using AI for coding, but...

    The headline is kind of misleading - they’re not saying the per-token cost is increasing. They’re saying people are using a lot more tokens.

    This is certainly true when using AI for coding, but I’m not sure it’s true for consumers in general. How often do you need a deep research query outside work?

    To continue the car metaphor, it’s like buying an SUV you rarely take off-road or a truck where you rarely use the bed.

    I see the GPT5 release as consumer-oriented; it’s about pleasing people who can’t be bothered to figure out which model has the best performance and only occasionally need it. (Compare with automatic transmission.) It was a bit rough at launch, but presumably the model-switching will improve.

    8 votes
  2. SloMoMonday
    Link
    To give some context if you're not familiar with the elements of LLMs. Tokens are how Data Models deconstruct information into its component parts to analyze. It's also what a model uses to...

    To give some context if you're not familiar with the elements of LLMs. Tokens are how Data Models deconstruct information into its component parts to analyze. It's also what a model uses to generate responses.

    A useful metric when costing for AI usage is the Cost per Million Tokens In/Out. Token In is your context, prompts, supporting data, system prompts. Tokens Out is your responses the model generates.

    Pure analytical models are fairly cheap because you feed in query + transformation + format and it will perform the required operations and spit out an answer. Reasoning models will take in your prompt and the entire conversation and rules and 100 other things; then it'll formulate an answer and it will have to generate several hundred tokens to present the answer in some vaguely human way. Over time, most models have insanely inflated output sizes.

    Logic dictates that more tokens equals more costs. But most LLM providers offer flate rate costings and unlimited tokens in the hope that token costs keep dropping by impossible factors. These costs are not dropping and in any meaningful way and whats worse, models are generating more tokens across more users.

    With that out the way, my thoughts:
    My current strategy in communicating the unacceptable risks posed by LLM SaaS dependency is finding AI bros that are also panicking about the state of the industry. Mostly because I'm tired of feeling like the ignored scientist in the disaster movie and it's a lot more meaningful when the optimists are afraid.

    I've had the same core concerns for well over a year and over time that list has only increased. So while I will not trust an AI bro with my life or company, I trust them to want their money printer to keep running (and hopefully print money some day).

    To properly credit, I found this article through this video by a guy that's running is own LLM web service and he does a good job breaking down the cost escalation through practical examples and gives his insight on the state of the industry.

    He is rightfully worried. He shows that there are buttons on his service that generates extra costs for no material benefits. If he's using the method I think he is, all that button probably did was increase the token budget and generate longer replies. The illusion of higher Cognative capabilities by changing 3 variables.

    The conspiratorial side of me can't help but bring this all back to the train wreck that is GPT5. GPT4.5 is one of the highest cost-per-token models because it was the sales pitch and sirens song. Altmen has been spouting prophecies on miracle hardware and revenue multipliers for years and I think he was counting crowdsourcing his AI dream future. But there was no miracle hardware and no was able to drastically increase these systems capabilities. And his retail customers could not use his toy to generate endless revenue, no matter how many tried. It was a glorified Google replacement and imaginary friend. Now they need to stop the bleeding.

    All I see in GPT5 is a bad MoE system designed to enforce strict controls by setting restrictive resource allocations. And it's probably trained to identify benchmarking and will throw more than enough resources to ace it. Q1-26 is going to either be a bloodbath because I can't see OpenAI pull a Tesla and delay forever. Too many people care and are directly affected.

    4 votes