5 votes

How to build an AI data center

3 comments

  1. skybrian
    (edited )
    Link
    From the blog post: I got a chance to go on a tour once and it was quite impressive, even many years ago. … So that’s about 10% overhead over what the computers themselves use. When I worked...

    From the blog post:

    Today, large data centers can require 100 megawatts (100 million watts) of power or more. That’s roughly the power required by 75,000 homes, or needed to melt 150 tons of steel in an electric arc furnace. Power demand is so central, in fact, that data centers are typically measured by how much power they consume rather than by square feet (this CBRE report estimates that there are 3,077.8 megawatts of data center capacity under construction in the US, though exact numbers are unknown). Their power demand means that data centers require large transformers, high-capacity electrical equipment like switchgears, and in some cases even a new substation to connect them to transmission lines.

    I got a chance to go on a tour once and it was quite impressive, even many years ago.

    Because of the amount of power they consume, substantial effort has gone into making data centers more energy efficient. A common data center performance metric is power usage effectiveness (PUE), the ratio of the total power consumed by a data center to the amount of power consumed by its IT equipment. The lower the ratio, the less power is used on things other than running computers, and the more efficient the data center.

    Data center PUE has steadily fallen over time. In 2007, the average PUE for large data centers was around 2.5: For every watt used to power a computer, 1.5 watts were used on cooling systems, backup power, or other equipment. Today, the average PUE has fallen to a little over 1.5. And the hyperscalers do even better: Meta’s average data center PUE is just 1.09, and Google’s is 1.1. These improvements have come from things like more efficient components (such as uninterruptible power supply systems with lower conversion losses), better data center architecture (changing to a hot-aisle, cold-aisle arrangement), and operating the data center at a higher temperature so that less cooling is required.

    So that’s about 10% overhead over what the computers themselves use. When I worked there, Google was quite proud of the work they did to improve datacenter efficiency. It was one of the many presentations I saw that invoked a feeling of “we’re not in Kansas anymore, are we?”

    After a while, you get used to it. And it seems that competitors are doing pretty well too.

    Because of these steady increases in data center efficiency, while individual data centers have grown larger and more power-intensive, power consumption in data centers overall has been surprisingly flat. In the U.S., data center energy consumption doubled between 2000 and 2007 but was then flat for the next 10 years, even as worldwide internet traffic increased by more than a factor of 20. Between 2015 and 2022, worldwide data center energy consumption rose an estimated 20 to 70%, but data center workloads rose by 340%, and internet traffic increased by 600%.

    Today data centers are still a small fraction of overall electricity demand. The IEA estimates that worldwide data centers consume 1 to 1.3% of electricity as of 2022 (with another 0.4% of electricity devoted to crypto mining). But this is expected to grow over time. SemiAnalysis predicts that data center electricity consumption could triple by 2030, reaching 3 to 4.5% of global electricity consumption. And because data center construction tends to be highly concentrated, data centers are already some of the largest consumers of electricity in some markets. In Ireland, for example, data centers use almost 18% of electricity, which could increase to 30% by 2028. In Virginia, the largest market for data centers in the world, 24% of the power sold by Virginia Power goes to data centers.

    Power availability has already become a key bottleneck to building new data centers. Some jurisdictions, including ones where data centers have historically been a major business, are curtailing construction. Singapore is one of the largest data center hubs in the world, but paused construction of them between 2019 and 2022, and instituted strict efficiency requirements after the pause was lifted. In Ireland, a moratorium has been placed on new data centers in the Dublin area until 2028. Northern Virginia is the largest data center market in the world, but one county recently rejected a data center application for the first time in the county’s history due to power availability concerns.

    Data centers can be especially challenging from a utility perspective because their demand is more or less constant, providing fewer opportunities for load shifting and creating more demand for firm power.

    Part of the electrical infrastructure problem is a timing mismatch. Utility companies see major electrical infrastructure as a long-term investment to be built in response to sustained demand growth. Any new piece of electrical infrastructure will likely be used far longer than a data center might be around, and utilities can be reluctant to build new infrastructure purely to accommodate them. In some cases, long-term agreements between data centers and utilities have been required to get new infrastructure built. An Ohio power company recently filed a proposal that would require data centers to buy 90% of the electricity they request from the utility, regardless of how much they use. Duke Energy, which supplies power to Northern Virginia, has similarly introduced minimum take requirements for data centers that require them to buy a minimum amount of power.

    Data center builders are responding to limited power availability by exploring alternative locations and energy sources. Historically, data centers were built near major sources of demand (such as large metro areas) or major internet infrastructure to reduce latency. But lack of power and rising NIMBYism in these jurisdictions may shift their construction to smaller cities, where power is more easily available. Builders are also experimenting with alternatives to utility power, such as local solar and wind generation connected to microgrids, natural gas-powered fuel cells, and small modular reactors.

    What impact will AI have on data center construction? Some have projected that AI models will become so large, and training them so computationally intensive, that within a few years data centers might be using 20% of all electricity. Skeptics point out that historically increasing data center demand has been almost entirely offset by increased data center efficiency. They point to things like Nvidia's new, more efficient AI supercomputer (the GB200 NVL72), more computationally efficient AI models, and future potential ultra-efficient chip technologies like photonics or superconducting chips as evidence that this trend will continue.

    [W]hile Nvidia’s new GB200 NVL72 can train and run AI models more efficiently, it consumes much more power in an absolute sense, using an astonishing 120 kilowatts per rack. Future AI-specific chips may have even higher power consumption. Even if future chips are more computationally efficient (and they likely will be), they will still consume much larger amounts of power.

    Because current data centers aren’t necessarily well-suited for AI workloads, AI demand will likely result in data centers designed specifically for AI. SemiAnalysis projects that by 2028, more than half of data centers will be devoted to AI. Meta recently canceled several data center projects so they could be redesigned to handle AI workloads. AI data centers will need to be capable of supplying larger amounts of power to individual racks, and of removing that power when it turns into waste heat. This will likely mean a shift from air cooling to liquid cooling, which uses water or another heat-conducting fluid to remove heat from computers and IT equipment. In the immediate future, this probably means direct-to-chip cooling, where fluid is piped directly around a computer chip. This strategy is already used by Google’s tensor processing units (TPUs) designed for AI work and for Nvidia’sGB200 NVL72. In the long term, we may see immersion cooling, where the entire computer is immersed in a heat-conducting fluid.

    2 votes
  2. [2]
    blivet
    Link
    Appallingly wasteful, especially considering it is all in service of a technology that no one asked for that solves a nonexistent problem.

    Appallingly wasteful, especially considering it is all in service of a technology that no one asked for that solves a nonexistent problem.

    1 vote
    1. skybrian
      Link Parent
      I'm not sure what you mean by "all," but this is how it goes with speculative investments. It's not just money, it's what the money is spent on. One example is huge numbers of cheap bikes that...

      I'm not sure what you mean by "all," but this is how it goes with speculative investments. It's not just money, it's what the money is spent on.

      One example is huge numbers of cheap bikes that were manufactured when there was a fad for starting bike-sharing services. Cheap bikes could be useful for someone, but who?

      On the other hand there is the "dark fiber" that was built during the dot-com boom. I assume it all got used eventually.

      So the question is whether building more data centers really is a waste in physical terms?

      Electricity generation is pretty general-purpose, particularly if it's green energy, so new electricity generation probably isn't a waste. It can definitely be overbuilt as happened in the 60's and 70's with nuclear power, but electricity utilities are pretty conservative now because they've been burnt before.

      Data centers are pretty reusable for any kind of computing. So are most servers, including GPU's and TPU's. Matrix multiplication is a pretty basic operation.

      So, if the boom ends tomorrow, there will be a lot of wasted effort, but I don't think the data centers go to waste? (Although, servers sometimes do depreciate pretty quickly due to there being better models.)

      2 votes