7
  
  Data Center Infrastructure
  The design and operations of data centers have evolved over time and will need to continue to adapt in order to meet the rising energy demands of artificial intelligence (AI) workloads and achieve greater flexibility in their power consumption. To examine these data center infrastructure design and operation issues, Ayse Coskun, Boston University, moderated a session focused on coupling flexible computing and infrastructure management, including cooling, with realistic performance and sustainability goals, ensuring scalability and responsiveness in the face of AI-driven demand. The panelists were Peter de Bock, Department of Energy (DOE); Ricardo Bianchini, Microsoft Azure; Adam Wierman, California Institute of Technology and Verrus; and Andrew Chien, University of Chicago.
PANELIST REMARKS
Advancing Efficient Cooling Systems
Running the AI chips to support society’s seemingly insatiable demand for AI applications requires a great deal of energy, and de Bock highlighted that a large portion of that energy demand comes from technologies used to keep information technology (IT) equipment cool. As the energy density of each server rack increases, this challenge only grows more acute. As a result, DOE has identified cooling technologies as an area ripe for efficiencies and innovations to alleviate the load on the grid and impacts on communities. de Bock described DOE’s Advanced
Research Projects Agency–Energy Cooling Operations Optimized for Leaps in Energy, Reliability, and Carbon Hyperefficiency for Information Processing Systems (COOLERCHIPS) initiative, which aims to develop high-performance cooling systems for energy-efficient data centers to support a clean energy future.1 Fifteen teams across industry and academia are working to reduce cooling energy needs by approximately 90 percent by designing innovative, reliable cooling systems that reduce strains on the grid and conserve limited resources such as energy and water, allowing more efficient use of grid connections. Ten of the awards are focusing on secondary cooling loop technologies; three are focusing on modular data center designs; one is advancing cooling, energy, and reliability tools; and one is focused on facilities and standards.
Evolving Data Center Energy Challenges in the Artificial Intelligence Era
Bianchini discussed how different AI processes influence the evolving energy demands of data centers. As other speakers noted, data center electricity demand is rapidly growing, potentially tripling—or more—by 2030.2 While efficiency efforts and innovations such as virtualization, cloud computing, and greater power usage effectiveness (PUE) successfully reduced energy demand in the past, Bianchini said that new advances are needed in the AI era.
The two main workloads associated with AI, training and inference, have different power needs. Training is performed in batches with high spikes in power use to support massive synchronous computation, a pattern that causes enormous power swings that can destabilize the grid (Figure 7-1). Bianchini added that training error corrections, the treatment of hardware failures during training, a necessary step in building large language models (LLMs), adds to the energy-intensiveness of training by requiring repeatedly rolling the execution back and resuming from the last checkpoint.
By contrast, inference consumes much less power because it is an interactive computation with lower utilization. However, its two phases, the prompt phase and the token phase,3 behave very differently, creating a challenge to optimize for these different behaviors. Another challenge is maximizing power oversubscription while retaining high inference performance.
___________________
1 Advanced Research Projects Agency-Energy, n.d., “COOLERCHIPS,” https://arpa-e.energy.gov/technologies/programs/coolerchips, accessed April 21, 2025.
2 D. Patel, D. Nishball, and J. Eliahou Ontiveros, 2024, “AI Datacenter Energy Dilemma—Race for AI Datacenter Space,” Semi Analysis, March 13, https://semianalysis.com/2024/03/13/ai-datacenter-energy-dilemma-race.
3 The prompt phase is when a model processes input, while the token phase is when the model draws on memory to generate a response.
 
    SOURCE: Ricardo Bianchini, Microsoft Azure, presentation to the workshop, November 12, 2024.
Toward a Dynamic and Cooperative Relationship Between Data Centers and the Grid
Chien highlighted recent trends in AI data center energy consumption and discussed opportunities to increase grid capacity and efficiency through more dynamic and cooperative relationships between data centers and the grid. The scale of computing power has grown by 50-fold over the decades, and the associated energy demand has also continued to skyrocket despite what Chien called “radical” improvements in energy efficiency for IT infrastructure.4 The consequences of the current scale of data center operations, he said, include energy swings that destabilize the grid and power demands that far exceed its capabilities, neither of which can be addressed by energy efficiencies alone.
“I’m not going to dispute that energy efficiency is important, but it’s not going to save us,” Chien stated. Instead, he posited that new grid infrastructure and modernizations are needed to keep up with this growing demand, and that more capacity is needed, especially for renewable energy. In addition, he emphasized the importance of flexibility in data center energy needs in order to balance loads with the variable nature of renewable energy, which is increasingly being integrated into a grid that lacks storage or load management strategies to optimally handle variable renewable generation.5 Echoing comments made by other speakers, Chien highlighted how data centers can act as a test case for grid modernization to support broader electrification trends. If their loads can be made dynamic and flexible, it is possible to support expanded grid capacity and increased efficiency while reducing costs, even as AI data centers continue to expand.6 “Data centers, we believe, are actually—as the first leg of this electrification—the first model case for how you deal with these large adaptive loads effectively for the grid,” Chien said. “And it turns out that if you can make your large, adaptive loads flexible, you can actually give a lot of benefits for the grid.”
Making data centers more flexible is challenging, however, because they are cost-optimized capital investments that have to meet service-level objectives. Emerging ideas to increase efficiency include optimized scheduling, hyperscaling, and power management strategies. Looking forward,
___________________
4 Goldman Sachs, 2024, “AI Is Poised to Drive 160% Increase in Data Center Power Demand,” May 14, https://www.goldmansachs.com/insights/articles/AI-poised-to-drive-160-increase-in-power-demand.
5 Energy Information Administration, 2025, “Short-Term Energy Outlook,” https://www.eia.gov/outlooks/steo.
6 L. Lin, R. Wijayawardana, V. Rao, H. Nguyen, E.W. Gnibga, and A.A. Chien, 2024, “Exploding AI Power Use: An Opportunity to Rethink Grid Planning and Management,” Pp. 434–441 in The 15th ACM International Conference on Future and Sustainable Energy Systems, e-Energy ’24, https://doi.org/10.1145/3632775.3661959.
Chien stressed the need for solutions to prevent data centers from becoming a “thundering herd” that crashes the grid or stresses its capacity to deliver clean energy. He added that fairness is also important, recognizing that large and small customers share the same grid and should bear the costs and benefits equally. To achieve this, he suggested that it may be necessary to incentivize flexibility through controlling load size or shape or encouraging consumption commitments to minimize the strain on the grid.
Designing Metrics to Drive Progress
Highlighting the role of metrics in driving progress, Wierman posited that new metrics are needed to support energy efficiency as AI data centers expand. The creation of PUE as a metric drove data centers’ dramatic efficiency improvements in the past, and while PUE is fairly simple to measure and validate, it can be manipulated.7 Two alternative metrics that have been proposed include carbon usage effectiveness (CUE), the ratio of carbon emissions to IT energy usage, and water usage effectiveness (WUE), the ratio of water usage to IT energy usage.
CUE and WUE have not yet been widely adopted, but Wierman said that they have already proven useful in identifying inefficiencies and driving improvements in data center infrastructure designs. However, they do not capture everything; for example, other metrics may be needed to address factors such as workload patterns, synchronicity, flexibility, grid integration, operational versus embedded carbon, and the role of power purchase agreements versus onsite renewable generation.
Many communities have expressed resistance to hosting data centers; Wierman noted that concerns about noise, utility impacts, and increased emissions and pollution can complicate siting and construction and constrain AI growth.8,9,10,11 Perhaps even more important are the signifi-
___________________
7 R. Bianchini, C. Belady, and A. Sivasubramaniam, 2024, “Data Center Power and Energy Management: Past, Present, and Future,” IEEE Micro 44(5):30–36, https://doi.org/10.1109/MM.2024.3426478.
8 P. Pineda, 2021, “Unsustainable, Resource-Hungry and Loud: Why Chandler Wants to Ban More Data Centers,” AZ Central, November 22, https://www.azcentral.com/story/news/local/chandler/2021/11/22/chandler-wants-ban-more-data-centers-after-years-complaints/8627569002.
9 D. Geiger and E. Thomas, 2024, “Electric Grids Need Upgrades Thanks to Data Centers. Guess Who Helps Pay for That,” Business Insider, April 18, https://www.businessinsider.com/data-centers-electricity-consumers-discounts-utilities-2024-4.
10 C. O’Donovan, 2024, “Fighting Back Against Data Centers, One Small Town at a Time,” Washington Post, October 5, https://www.washingtonpost.com/technology/2024/10/05/data-center-protest-community-resistance.
11 E. Halper, 2024, “A Utility Promised to Stop Burning Coal. Then Google and Meta Came to Town,” Washington Post, October 12, https://www.washingtonpost.com/business/2024/10/08/google-meta-omaha-data-centers.
cant health impacts data centers can inflict on communities. Even with the lower-end estimates of data center growth, the health costs related to data center emissions for the United States as a whole are projected to exceed on-road emissions in the state of California by 2030. The health costs associated with these emissions rival data centers’ electricity costs and are particularly high in low-income communities. In light of these important concerns and impacts of data center expansion, Wierman challenged attendees to consider what metrics might be most useful in driving progress toward cleaner and more just practices in the future.
PANEL DISCUSSION
Participants discussed a wide range of opportunities for evaluating and optimizing data center infrastructure to address impacts on the grid, local communities, and the environment.
Opportunities for Optimizing Data Centers
Coskun asked the panelists to name what they see as the most important changes to data center design and operation that are needed to meet the growing AI demand. Underscoring the need for a more holistic approach to data center design, de Bock suggested viewing data centers as essentially one large computer whose every element, from a chip to the entire building, must be fully optimized for energy efficiency. Current metrics fail to capture this full spectrum, and even the term “data center” lacks a clear definition, but he suggested that lessons could be gleaned from other fields—such as aerospace reliability metrics—to inform new and more holistic approaches. He added that there could also be opportunities to redesign AI algorithms with increased reliability to improve efficiency and minimize failures and energy-intensive resets. Bianchini agreed but noted that optimizing every layer of a data center poses an enormous challenge. For instance, he noted that cooling systems will need to change from air cooling to liquid cooling methods (e.g., cold plates for direct-to-chip cooling), power-delivery systems will need to change as rack power densities grow exponentially, the power swings of large-scale training need to be mitigated, and AI models themselves need efficiency improvements. Every layer presents unique problems—and opportunities—for innovation.
Wierman added that the uncertainty and rapid growth in AI makes these tasks even more daunting and highlighted the value of building more flexibility into every design element. Chien agreed that new metrics should be designed to incentivize flexibility and sustainability, adding that solutions need to be cost-effective to ensure wide implementation.
Interacting with the grid today is not cost-effective, and while that is an enormous challenge, there are solutions. For example, data centers that have flexible loads and span multiple municipalities can more easily harmonize with multiple grids and leverage demand response programs.
Bianchini added that certain grid interactions require active “load shedding,” and optimizing data centers against this is another large challenge. Wierman said that cost-effective, flexibility-supporting infrastructure improvements, such as onsite electricity storage, can help to address this. Like de Bock, he pointed to aerospace for models and suggested using service-level agreements that specify reliability levels for particular situations. It may also be possible to create a new metric that quantifies a data center’s flexibility to enable better comparisons and inform decision-making.
Given the large and growing number of chips in AI data centers, de Bock highlighted the importance of incorporating flexibility in cooling. In addition, he noted that it is difficult to ensure reliability when it is unclear if the grid can deliver the power needed. Data center owners may find it more cost-effective to build and control their own smaller power plants, which puts pressure on the grid and makes integration challenging but also has benefits, as having highly reliable power will enhance flexibility for meeting data centers’ dynamic loads. Building on this point, Bianchini noted that grid stability is just as important as cost-effectiveness, if not more so, especially as the grid integrates more renewable energy sources and attempts to handle more power swings. Data center siting can impact this stability, and de Bock described the current situation as akin to a “tightrope walker” balancing supply and demand on a grid that was not built for today’s needs.
Accounting for Data Center Impacts
Panelists discussed approaches for including various types of impacts of data centers in thinking about optimizing data center infrastructure moving forward. Laura Gonzalez Guerrero, Clean Virginia, asked about appropriate methods for accounting for externalities such as health impacts. As the technology industry races ahead to expand data center infrastructure, Bianchini said that it is important for utilities and governments to agree on how such externalities are measured and paid for. Chien expressed his view that utilities could likely price data centers’ energy costs differently without slowing down growth, as companies are already agreeing to pay above-market prices to ensure energy supply. Wierman agreed and noted that extra energy costs are considered nominal because the most important resources that data center owners consider are access to land and access to energy. Coskun noted that more research
is needed to better understand how the cost of operational electricity factors into data centers’ costs.
Coskun prompted panelists to elaborate on proposed metrics for efficiency, sustainability, and health care costs. Wierman replied that the health care cost metrics his group used were created through established models, such as models developed by the U.S. Environmental Protection Agency, for example, linking cases of asthma to extra emissions, in order to better reflect how data centers impact communities and grids. He added that developing better metrics in this area would also improve siting and design negotiations between utilities and data center developers. Bianchini agreed with Wierman’s point that PUE is flawed and changes or new metrics are needed. However, he said that there is no consensus on potential solutions nor is there quality data to enact them. “We’re nowhere near having a good set of metrics,” he stated.
Chien agreed that better metrics are needed and added that advancing sustainability goals through carbon accounting poses another challenge, but this is a challenge that needs to be addressed across all of society, not by industry alone. He suggested a more direct attribution of emissions to their sources could help to incentivize behavior change. Another challenge has been how flexibility impacts different grid settings. On this issue, he said that two potential solutions are adding more capacity or changing generation load duties. Unfortunately, adaptive loads can impact computing and may require another metric to capture and alter it, but he expressed confidence that flexibility solutions are coming.
While he said that the focus on PUE has been instrumental in creating improvements, de Bock also agreed that new metrics are needed, especially to compare the many new cooling technologies. To this end, he said that the COOLERCHIPS initiative is creating better definitions of PUE denominator and rack density, and adding local weather inputs to test models in challenging climates. Metrics of consideration include total power usage effectiveness and IT power usage effectiveness. However, there is very little public reporting of water and electricity use, making it challenging to get a full picture of the growth that is happening and its possible impacts.
Interactions with the Grid
Data centers do not only cause negative impacts, de Bock noted; they also create tax revenue and other benefits for communities. Under new policies, they could also generate and share their own power through microgrids that add capacity to the larger grid. Bianchini noted that large cloud providers are already exploring the idea of microgrids in some cases. A participant asked if data centers could become “first-class” actors
on the grid, meaning they have some control and scheduling power over their load even during power swings, enhancing grid stability. Chien replied that the distinction between data centers as independent actors and as first-class actors speaks to the heart of the problem. Grid operations are designed to control electric power service while maintaining stability and resilience for the whole grid, rather than to provide resilience for any particular load. Since data centers are not primarily run for the benefit of the grid, having more control in the hands of these entities would not necessarily align with grid operators’ objectives, and resistance to data sharing and transparency on the part of data centers could make it difficult to accomplish this level of cooperation anyway.
Building on the points panelists made about the need for metrics that are relevant to key goals, Line Roald, University of Wisconsin–Madison, proposed using cost-usage effectiveness to capture the price that data centers pay in electricity markets. This metric can help data centers prioritize their workloads by reacting to price signals, which improves grid stability and communities’ understanding of their impacts, she said. Wierman expressed support for Roald’s suggestion and agreed that measuring data center–grid interactions could shed light on the services data centers provide and the cost impacts of data center flexibilities. He noted that some European grids are experimenting with prioritizing interconnections depending on what a given data center is doing, and while enforcement could be challenging, it is important to consider new ideas as data centers create power constraints that affect activities across society. However, Chien said he is not sure such an approach would be in line with the United States’ open-market principles that allow different valuations of an activity amid different resource allocation. Certainly, critical infrastructure such as hospitals must be prioritized, but he said that in commerce, it is very hard to harmonize resource prioritization and allocation, especially globally.
Data Center Siting
Eric Masanet, University of California, Santa Barbara, asked what factors influence the siting of large training data centers. Bianchini replied that a cool climate, plentiful water, sufficient grid capacity, and a lack of competition over resources are needed. However, he cautioned that training should not be the focus, because the training infrastructure will be substantially smaller than the inference infrastructure and will not be easily monetized. Given the interactivity and latency constraints of inference, Bianchini and de Bock noted that the data centers to support this aspect of AI are unlikely to be located very far from the customer base. The siting priorities for data centers have changed over time, de Bock
added, evolving from a focus on where the best networks were, to where land was cheap, to where power is available. He also added that cooling is a solvable problem that will expand siting flexibility and enable data centers to operate in warmer climates, which is important because warmer places also tend to have more wind and solar power, which keeps energy prices low. He speculated that different types of data center designs may develop in response to differing operations, energy needs, and latency constraints.
Chien noted that inference and training are dynamic activities with different latency requirements. Today’s users are tolerating longer latencies, and combining that with emerging inference workloads creates more flexibility for data center locations.
Capturing Artificial Intelligence Costs and Benefits
A participant asked how metrics can reflect the trade-offs involved in AI’s impacts on the economy and society, both locally and globally. Chien suggested it may be too early to place a value on AI’s positive impacts because it can take years before adoption is widespread enough to see productivity gains. Bianchini agreed that adoption can take decades—AI is advancing much faster than other technologies, and while many benefits can be envisioned, it will be a long time before AI tools are fully established in daily life. In his view, de Bock stated that one of AI’s most valuable contributions to date is natural language processing. While this capability enables people to more effectively interact with computers without specialized expertise, he acknowledged that the exact benefit of this is hard to measure. As to drawbacks, de Bock noted that many data centers use noisy, diesel-powered generators as backup power sources, highlighting the need for new technologies such as innovative battery designs. Wierman agreed that AI improvements also come with costs, and it is important to be able to measure and balance both the benefits and the costs.
Advancing Flexibility
One participant asked panelists to comment on why flexibility is seen as so important when the future of AI is inference, which is latency sensitive and not especially flexible. Wierman replied that apart from workloads, there are many infrastructure opportunities for improving flexibility, such as oversubscription and onsite energy storage, which are becoming increasingly cost-effective as they scale up. Chien added that rather than turn workloads on or off, there are flexibility opportunities within the applications themselves, such as using smaller models or constraining computation when power prices rise. In addition, de Bock
suggested the potential for modularized data centers that are dynamically located and equipped according to an expanding reference design to address these issues, but he noted that they would need new fiber optic networks to offset energy or latency needs.
Additional Environmental Considerations
Erma Leaphart-Gooch, Sierra Club Michigan, asked panelists to comment on data centers’ impacts on source water and the potential to use geothermal energy to supplement other energy sources. While geothermal energy is low carbon and has large-scale potential, de Bock said that there are many steps before it will be ready for broader deployment in the United States. As for water, de Bock noted that some current data centers use water in outside evaporative cooling towers, but the COOLERCHIPS initiative could help to identify more sustainable innovations that would eliminate the need for water evaporation. Wierman agreed that data centers’ water use is an important issue, and said that its impacts on the local environment and public health should be explicitly measured. Bianichini also agreed and noted that many hyperscale companies are moving away from evaporation towers.
 
               
               
               
               
               
               
               
               
               
               
              