Demand response could be done by covering the data center with battery energy or not. Demand response and batteries can stack: if the grid is really stressed, a data center can both turn off and discharge its battery into the grid.
Economically, it makes sense to accept some true downtime to avoid months-long delays in data center construction. This is clearly true for training workloads which are very important but don’t have live demand. But downtime for even inference clusters is acceptable: you can reduce the compute demand by temporarily slowing down token generation, or use dynamic rate limits. And any curtailment would almost certainly be isolated to one region, so inference data centers in other places would still be operational.
In any case, the paper says the curtailments would last about two hours each:
The average duration of load curtailment (i.e., the length of time the new load is curtailed during curtailment events) would be relatively short, at 1.7 hours when average annual load curtailment is limited to 0.25%, 2.1 hours at a 0.5% limit, and 2.5 hours at a 1.0% limit
Demand response could be done by covering the data center with battery energy or not. Demand response and batteries can stack: if the grid is really stressed, a data center can both turn off and discharge its battery into the grid.
Economically, it makes sense to accept some true downtime to avoid months-long delays in data center construction. This is clearly true for training workloads which are very important but don’t have live demand. But downtime for even inference clusters is acceptable: you can reduce the compute demand by temporarily slowing down token generation, or use dynamic rate limits. And any curtailment would almost certainly be isolated to one region, so inference data centers in other places would still be operational.
In any case, the paper says the curtailments would last about two hours each: