10 votes

AI tokens are getting more expensive

3 comments

  1. [2]
    skybrian
    Link
    The headline is kind of misleading - they’re not saying the per-token cost is increasing. They’re saying people are using a lot more tokens. This is certainly true when using AI for coding, but...

    The headline is kind of misleading - they’re not saying the per-token cost is increasing. They’re saying people are using a lot more tokens.

    This is certainly true when using AI for coding, but I’m not sure it’s true for consumers in general. How often do you need a deep research query outside work?

    To continue the car metaphor, it’s like buying an SUV you rarely take off-road or a truck where you rarely use the bed.

    I see the GPT5 release as consumer-oriented; it’s about pleasing people who can’t be bothered to figure out which model has the best performance and only occasionally need it. (Compare with automatic transmission.) It was a bit rough at launch, but presumably the model-switching will improve.

    8 votes
    1. milkywayflyinginsect
      (edited )
      Link Parent
      Yeah, it's not that the tokens are getting more expensive, it's that reasoning models are putting out more tokens and because of that AI cost has actually gone up overall despite the predictions...

      Yeah, it's not that the tokens are getting more expensive, it's that reasoning models are putting out more tokens and because of that AI cost has actually gone up overall despite the predictions in the past that everything's going to get cheaper. This is unsustainable for businesses who give unlimited AI usage for a flat subscription cost.

      This is certainly true when using AI for coding, but I’m not sure it’s true for consumers in general. How often do you need a deep research query outside work?

      While it may be true that for deep research, i don't think it's true for reasoning models. I think we will be relying more and more on these reasoning models as we go into the future. They're just so much more reliable and better. And most people will always go for the best models. It's one of the points the author makes.
      I almost never use the non-reasoning models for medium+ difficulty tasks. Only for easy questions.

      It used to be before that reasoning models would be very unconversational, unnatural sounding , robotic, have weird formatting and that is still the case for a lot of them, but googles gemini 2.5 pro bridged that gap. I don't know what they did, but I remember having that wow moment when using it. It was a big leap. I almost exclusively use it now.

      I think non-reasoning models will still exist and have a place but most people will just be using the reasoning ones.

      By the way, there's this Youtuber that went through this article and also shared his experience (he has an AI business as well), i found it insightful how costly grok 4 is.
      https://youtu.be/mRWLQGMGY80

      3 votes
  2. SloMoMonday
    Link
    To give some context if you're not familiar with the elements of LLMs. Tokens are how Data Models deconstruct information into its component parts to analyze. It's also what a model uses to...

    To give some context if you're not familiar with the elements of LLMs. Tokens are how Data Models deconstruct information into its component parts to analyze. It's also what a model uses to generate responses.

    A useful metric when costing for AI usage is the Cost per Million Tokens In/Out. Token In is your context, prompts, supporting data, system prompts. Tokens Out is your responses the model generates.

    Pure analytical models are fairly cheap because you feed in query + transformation + format and it will perform the required operations and spit out an answer. Reasoning models will take in your prompt and the entire conversation and rules and 100 other things; then it'll formulate an answer and it will have to generate several hundred tokens to present the answer in some vaguely human way. Over time, most models have insanely inflated output sizes.

    Logic dictates that more tokens equals more costs. But most LLM providers offer flate rate costings and unlimited tokens in the hope that token costs keep dropping by impossible factors. These costs are not dropping and in any meaningful way and whats worse, models are generating more tokens across more users.

    With that out the way, my thoughts:
    My current strategy in communicating the unacceptable risks posed by LLM SaaS dependency is finding AI bros that are also panicking about the state of the industry. Mostly because I'm tired of feeling like the ignored scientist in the disaster movie and it's a lot more meaningful when the optimists are afraid.

    I've had the same core concerns for well over a year and over time that list has only increased. So while I will not trust an AI bro with my life or company, I trust them to want their money printer to keep running (and hopefully print money some day).

    To properly credit, I found this article through this video by a guy that's running is own LLM web service and he does a good job breaking down the cost escalation through practical examples and gives his insight on the state of the industry.

    He is rightfully worried. He shows that there are buttons on his service that generates extra costs for no material benefits. If he's using the method I think he is, all that button probably did was increase the token budget and generate longer replies. The illusion of higher Cognative capabilities by changing 3 variables.

    The conspiratorial side of me can't help but bring this all back to the train wreck that is GPT5. GPT4.5 is one of the highest cost-per-token models because it was the sales pitch and sirens song. Altmen has been spouting prophecies on miracle hardware and revenue multipliers for years and I think he was counting crowdsourcing his AI dream future. But there was no miracle hardware and no was able to drastically increase these systems capabilities. And his retail customers could not use his toy to generate endless revenue, no matter how many tried. It was a glorified Google replacement and imaginary friend. Now they need to stop the bleeding.

    All I see in GPT5 is a bad MoE system designed to enforce strict controls by setting restrictive resource allocations. And it's probably trained to identify benchmarking and will throw more than enough resources to ace it. Q1-26 is going to either be a bloodbath because I can't see OpenAI pull a Tesla and delay forever. Too many people care and are directly affected.

    4 votes