Google must double AI serving capacity every 6 months to meet demand - ~tech

[4]

Greg

7 hours, 37 minutes ago

Link

How much of this is user demand, I wonder, and how much is “we decided on your behalf to put an LLM summary at the top of every search”?

How much of this is user demand, I wonder, and how much is “we decided on your behalf to put an LLM summary at the top of every search”?

38 votes

post_below (OP)
7 hours, 31 minutes ago
Link Parent
LLM summaries and corporate hype "force AI into everything and figure out the details later" has gotta be well over half of the demand.

LLM summaries and corporate hype "force AI into everything and figure out the details later" has gotta be well over half of the demand.

13 votes
AugustusFerdinand
5 hours, 59 minutes ago
Link Parent
Or say, opting everyone in to Gemini in GMail automatically: https://www.malwarebytes.com/blog/news/2025/11/gmail-is-reading-your-emails-and-attachments-to-train-its-ai-unless-you-turn-it-off

Or say, opting everyone in to Gemini in GMail automatically: https://www.malwarebytes.com/blog/news/2025/11/gmail-is-reading-your-emails-and-attachments-to-train-its-ai-unless-you-turn-it-off

8 votes
gary
6 hours, 24 minutes ago
Link Parent
For Google specifically, hard to tell. But in general, AI summaries and the like are probably very popular. ChatGPT has 700 million weekly active users and the website is now ranked #5 worldwide,...

For Google specifically, hard to tell. But in general, AI summaries and the like are probably very popular. ChatGPT has 700 million weekly active users and the website is now ranked #5 worldwide, suggesting people actually go and use the chat. Some amount of that use case will overlap with what traditionally would have been Google's domain.

4 votes

creesch

7 hours, 36 minutes ago

Link

One has to wonder how much of that demand is genuine demand driven by useful applications, etc. And how much is demand from AI being bolted onto every single applications by decree of C-suites...

One has to wonder how much of that demand is genuine demand driven by useful applications, etc. And how much is demand from AI being bolted onto every single applications by decree of C-suites with a major case of FOMO.

I'd also be curious to see what all these datacenters will be used if the bubble does burst.

16 votes

[12]

Wes

7 hours, 13 minutes ago

Link

I do think the headline is worse than it sounds. Doubling the "serving capacity" is very different than doubling the compute costs themselves. Serving (ie. inference) is only a tiny portion of...

I do think the headline is worse than it sounds. Doubling the "serving capacity" is very different than doubling the compute costs themselves. Serving (ie. inference) is only a tiny portion of compute in comparison to training. They could likely double serving capacity a dozen times before the cost was even comparable.

Also, since training is going to happen regardless, it actually makes sense to me to get more use out of these existing models by increasing their number of users. The costs aren't much higher, and it at least justifies the enormous investment.

That said, I am certainly concerned about climate change, and do want to see these companies invest in clean energy to power their own energy-hungry server farms. They should be funding projects in nuclear, solar, geothermal, wind, and even hydro if the region is appropriate. They can do better to reclaim the waste heat that they're already producing, and create closed-loop systems for cooling.

It seems the problem is that everybody has a "winner takes all" mentality. The only way to stem that is with government regulation. I would support legislation that requires all companies over a market cap to invest in energy alongside their own AI build-out. It might not be "fair", but it is necessary.

Unfortunately, it's very unlikely to happen in the current political climate. Particularly as it would give China and other countries seriously investing in AI a leg up in the race.

Personally, I don't think that's going to matter in the long run. Models are becoming commoditized, and the open-source community has shown that even smaller startups can compete. It seems likely to me that we're all going to run headfirst into a technological wall, eventually, at which point there will be no meaningful difference between models.

9 votes

[2]
Greg
7 hours, 3 minutes ago
Link Parent
Overall I agree with where you’re coming from, but I was surprised when I saw some OpenAI cost estimates recently: they’re scaling the user base to the extent that training spend is only a bit...

Overall I agree with where you’re coming from, but I was surprised when I saw some OpenAI cost estimates recently: they’re scaling the user base to the extent that training spend is only a bit more than double the inference spend. So still a good amount more, for sure, but less of a gap than I expected.

6 votes
1. Wes
  6 hours, 49 minutes ago
  Link Parent
  That is also less of a gap than I'd have expected. Google may not have the same user numbers as OpenAI, but they do have a wider product range. I'm sure they're working within a similar order of...
  
  That is also less of a gap than I'd have expected. Google may not have the same user numbers as OpenAI, but they do have a wider product range. I'm sure they're working within a similar order of magnitude at least.
  
  Thanks for putting some real numbers to my supposition!
  
  4 votes
[9]
post_below (OP)
6 hours, 54 minutes ago
Link Parent
My understanding is that it's the other way around, training is massively expensive up front but over time the compute cost of inference is far larger. From a computer science perspective it's...

Serving (ie. inference) is only a tiny portion of compute in comparison to training. They could likely double serving capacity a dozen times before the cost was even comparable.

My understanding is that it's the other way around, training is massively expensive up front but over time the compute cost of inference is far larger. From a computer science perspective it's hard to imagine it could be otherwise.

One way to contextualize it would be thinking about running a local model. You need a high end GPU with a lot of memory to even consider it and it will fully use all of the resources available for quite a long time for every request. It's more resource use than almost anything else you might do with your system, even most video editing/creation. Then think about what that looks like over the course of a day's work, with dozens or 100's of prompts.

A frontier model being served to the public from the cloud is using magnitudes more compute than a local model for every request from every user. For example, every Google search. Over time, training costs can't compete with that.

2 votes
1. [3]
  Wes
  6 hours, 41 minutes ago
  Link Parent
  Inference would certainly overtake training in costs, but only if training is ever stopped. Right now, to the best of my knowledge, it continues in an ongoing process. They're constantly working...
  
  Inference would certainly overtake training in costs, but only if training is ever stopped. Right now, to the best of my knowledge, it continues in an ongoing process. They're constantly working on different modalities (video, speech, spacial models, etc), and creating new checkpoints on existing models to update their knowledge cutoffs. They're also creating many specialized models for specific applications (medical, scientific, research, etc).
  
  Inference costs will certainly go up as user demand does, and it seems likely that it will at some point pass training, but I don't think that's going to be for a while yet. Granted, the formula may change as new techniques are discovered that boost the efficiency of one side or the other.
  
  Regarding the Google Search preview, that model is bad. I can only assume they're running a barebones LLM at a very low cost, due to the scale. The "AI Mode" feature they offer to open to is much more sophisticated.
  
  It wouldn't surprise me if the rest of the search page is actually more expensive to generate than that LLM snippet is.
  
  7 votes
  1. [2]
    post_below (OP)
    6 hours, 35 minutes ago
    Link Parent
    Surprisingly I can't find any information about training vs inference from sources other than blogs. However the bloggers that have something to say about it all seem to say the same thing, here's...
    
    Surprisingly I can't find any information about training vs inference from sources other than blogs. However the bloggers that have something to say about it all seem to say the same thing, here's one.
    
    Either way, however the comparison looks, the compute needed for inference is growing really fast and that's going to have a big carbon footprint.
    
    1 vote
    
    Wes
    6 hours, 16 minutes ago
    Link Parent
    I think that post may be an ad for LMCache. :) But yes, to their opening claim: I think that's certainly true. It's just that, as I mentioned above, training is still an ongoing investment at this...
    
    I think that post may be an ad for LMCache. :) But yes, to their opening claim:
    
    Running AI models (inference) over time costs much more than training them once.
    
    I think that's certainly true. It's just that, as I mentioned above, training is still an ongoing investment at this time. If/when models are ever "good enough" and they stop training, the race will certainly be over.
    
    Their graph shows that for open-source models (Llama, DeepSeek), it would take a few years worth of inference to match the initial training costs. For OpenAI, it's reversed. The older GPT-3 had higher inference costs almost immediately, whereas GPT-4 took about 1 year to match. I wonder if that indicates a change in efficiency for OpenAI's models over time.
    
    The math is made a little more complicated by the fact that models are now grown out from existing model checkpoints, rather than reset at each generation. The cost of training "GPT-5" is hard to quantify in isolation.
    
    Additionally, models are now used to train other models. They can be used to synthesize training data and judge responses, assisting in training future generations. They are also sometimes used to distill smaller models, which is more targeted than quantizing down a larger model. So the total cost of something like Google's Gemma is likely considerably smaller than that of Gemini.
    
    Anyway, I don't disagree with your main thrust which is that inference costs are going up over time, and there is an associated carbon footprint. It's something I'd like to see addressed as well.
    
    1 vote
2. [5]
  R3qn65
  6 hours, 49 minutes ago
  Link Parent
  I think this can't be accurate - if for no other reason than that the big players are all using dedicated AI chips instead of graphics cards now. Consider also how much capability can run on a...
  
  frontier model being served to the public from the cloud is using magnitudes more compute than a local model
  
  I think this can't be accurate - if for no other reason than that the big players are all using dedicated AI chips instead of graphics cards now. Consider also how much capability can run on a phone, when optimized for that - there's just no way that frontier models are using more compute resources to serve requests than it would take you to have an unoptimized local LLaMA do it on your gtx 6070.
  1. [2]
    Greg
    6 hours, 42 minutes ago
    Link Parent
    Any LLM you can run locally is making massive compromises to allow you to do so - the fact they still work after dropping 90% of the parameters and quantising the rest by a factor of four is...
    
    Any LLM you can run locally is making massive compromises to allow you to do so - the fact they still work after dropping 90% of the parameters and quantising the rest by a factor of four is actually super impressive to me, but if you want to see the compute it takes to run a full fat model, just try running flagship DeepSeek in its native bf16. And DeepSeek’s market-shaking breakthrough was its efficiency, too; the broad understanding as far as I’m aware is that the major closed source models are even more resource intensive in the interest of getting better results.
    
    As I understand it more or less everyone is still running NVIDIA chips, too? I know Google’s been pushing the TPUs for a long while, and we’re getting general tensor coprocessors in more hardware as time goes on, but there’s a reason NVIDIA’s datacenter division is basically driving the entire S&P500 nowadays.
    
    3 votes
    
    R3qn65
    1 hour, 55 minutes ago
    Link Parent
    It is NVIDIA, but it's not traditional graphics cards anymore. Everyone is on or moving to Blackwell (Nvidia's AI chips) or comparable products.
    
    As I understand it more or less everyone is still running NVIDIA chips, too?
    
    It is NVIDIA, but it's not traditional graphics cards anymore. Everyone is on or moving to Blackwell (Nvidia's AI chips) or comparable products.
    
    1 vote
  2. [2]
    post_below (OP)
    6 hours, 31 minutes ago
    Link Parent
    I could easily be wrong about the degree of difference, but the compute needed to handle a request to a frontier model is definitely dramatically higher than that needed for a local model. The two...
    
    I could easily be wrong about the degree of difference, but the compute needed to handle a request to a frontier model is definitely dramatically higher than that needed for a local model. The two things aren't even in the same ballpark, even accounting for better hardware on the datacenter end.
    
    skybrian
    58 minutes ago
    Link Parent
    My understanding is that when doing inference, these large systems can run multiple queries at once in the same batch to improve efficiency. However, how big these batches typically are for the...
    
    My understanding is that when doing inference, these large systems can run multiple queries at once in the same batch to improve efficiency. However, how big these batches typically are for the major AI labs is unknown.

[3]

post_below (OP)

7 hours, 46 minutes ago (edited 7 hours, 40 minutes ago)

Link

In a recent thread I talked about how AI is becoming genuinely useful, and it is. But that doesn't negate its environmental impact, which I don't think we talk about enough. Think about the math...

In a recent thread I talked about how AI is becoming genuinely useful, and it is. But that doesn't negate its environmental impact, which I don't think we talk about enough.

Think about the math of doubling every six months. Then add in all of the other tech giants, who are shovelling just as much money and resources at the problem. That includes (to a somewhat lesser degree) the newer players that are focused only on AI like Anthropic. And then also consider that the doubling they're talking about is only for serving capacity. There's also massive and ever-growing compute needed for training.

It's mind blowing, and concerning.

From the article:

At an all-hands meeting on Nov. 6, Amin Vahdat, a vice president at Google Cloud, gave a presentation, viewed by CNBC, titled “AI Infrastructure,” which included a slide on “AI compute demand.” The slide said, “Now we must double every 6 months.... the next 1000x in 4-5 years.”

“The competition in AI infrastructure is the most critical and also the most expensive part of the AI race,” Vahdat said at the meeting, where Alphabet CEO Sundar Pichai and CFO Anat Ashkenazi also took questions from employees.

The presentation was delivered a week after Alphabet reported better-than-expected third-quarter results and raised its capital expenditures forecast for the second time this year, to a range of $91 billion to $93 billion, followed by a “significant increase” in 2026. Hyperscaler peers Microsoft
, Amazon and Meta also boosted their capex guidance, and the four companies now expect to collectively spend more than $380 billion this year.

Clearly, and unsurprisingly, big tech isn't concerned about climate impact, they just don't want to be late to the gold rush. From Pichai:

He then reiterated a point he’s made in the past about the risks of not investing aggressively enough, and highlighted Google’s cloud business, which just recorded 34% annual revenue growth to more than $15 billion in the quarter. Its backlog reached $155 billion.

“I think it’s always difficult during these moments because the risk of underinvesting is pretty high,” Pichai said. “I actually think for how extraordinary the cloud numbers were, those numbers would have been much better if we had more compute.”

In other words: There is maybe no upper limit to what we should spend on increasing capacity so that we can capture every last ounce of value.

1000x in 4-5 years, potentially across all of the big tech companies. Even if the quoted VP was being hyperbolic, Pichai seems to think maybe the current goals aren't even enough, so it's probably not that hyperbolic.

Of course there isn't anything any of us can do about the problem, the bigger part of the serving demand is driven by enterprise users, it's a nation state level problem to solve. But I think we should at least be aware of it, and talking about it.

Often in conversations about server and datacenter environmental impact the question of whether or not it's a big enough problem to be worried about comes up. I would say that at 1000x in 5 years it definitely is something to be concerned about. The goal, after all, is to be decreasing carbon footprint rather than multiplying it.

5 votes

[2]
post_below (OP)
5 hours, 35 minutes ago
Link Parent
As this comment seems to be illustrating, sitting sadly down here at the bottom of the thread... When it comes to LLMs, people aren't particularly interested in talking environmental impact. It's...

As this comment seems to be illustrating, sitting sadly down here at the bottom of the thread... When it comes to LLMs, people aren't particularly interested in talking environmental impact. It's not as exciting as the tech, or the implications of the tech on society. And I get it, the tech is insane, for better and worse.

I personally believe the climate considerations are pretty important. Not just in the case of AI, it's just the biggest example, but with every advancement. Even now in 2025 the carbon footprint part of the conversation often gets crowded out. Climate is a big part of a lot of conversations these days but it's a long way from being a big enough part. As long as that's true, politicians will be less motivated to care.

Somewhat related... Where I live, each year a foundation allocates money to local high schools under the premise that students should decide which nonprofits/causes to support with it. This year, out of about 100 causes students chose to support, 2 were about the environment or climate change. It just doesn't seem to be in the zeigeist as much as you'd hope.

1 vote
1. skybrian
  2 hours, 50 minutes ago
  Link Parent
  The future is hard to predict, but I don't think today's rate of growth in AI usage will keep doubling for five years and don't take that 1000x projection seriously. It's only a model....
  
  The future is hard to predict, but I don't think today's rate of growth in AI usage will keep doubling for five years and don't take that 1000x projection seriously. It's only a model. Hockey-stick graphs tend to turn into S-curves.
  
  For a historical comparison, Internet usage in the US was doubling until 1996 or so, but after that, growth was more linear.
  
  2 votes

skybrian

5 hours, 54 minutes ago

Link

From the end of the article: Some Googlers are going to motivated to find ways to improve efficiency. It looks good at promotion time when you can multiply a 0.1% improvement by a very large...

From the end of the article:

This story is updated to more precisely reflect Amin Vahdat’s comments on the need to meet demand by both increasing capacity and improving efficiency.

Some Googlers are going to motivated to find ways to improve efficiency. It looks good at promotion time when you can multiply a 0.1% improvement by a very large number to demonstrate a cost savings that's many times your salary. There's no physical law preventing better results using less computing power.

Meanwhile, usage goes up. But with enough efficiency improvements, the cost of serving a lot more AI results might not be as expensive for Google as you'd expect. The efficiency improvements enable more growth with less capital costs and energy usage.

3 votes