Google used DeepMind algorithms to dramatically boost energy efficiency in its data centers. More recent work adapts its approach to commercial buildings in general.
What’s new: Jerry Luo, Cosmin Paduraru, and colleagues at Google and Trane Technologies built a model that learned, via reinforcement learning, to control the chiller plants that cool large buildings.
Key insight: Chiller plants cool air by running it past cold water or refrigerant. They’re typically controlled according to heuristics that, say, turn on or off certain pieces of equipment if the facility reaches a particular temperature, including constraints that protect against damaging the plant or exposing personnel to unsafe conditions. A neural network should be able to learn more energy-efficient strategies, but it must be trained in the real world (because current simulations don’t capture the complexity involved) and therefore it must adhere rigorously to safety constraints. To manage safety, the model can learn to predict the chiller plant’s future states, and a hard-coded subroutine can deem them safe or unsafe, guiding the neural network to choose only safe actions.
How it works: The authors built separate systems to control chiller plants in two large commercial facilities. Each system comprised an ensemble of vanilla neural networks plus a safety module that enforced safety constraints. Training took place in two phases. In the first, the ensemble trained on data produced by a heuristic controller. In the second, it alternated between training on data produced by itself and the heuristic controller.
- The authors collaborated with domain experts to determine a chiller plant’s potential actions and states. Actions comprised 12 behaviors such as switching on a component or setting a water chiller’s temperature. States consisted of measurements taken every 5 minutes by 50 sensors (temperature, water flow rate, on/off status of various components, and so on). They also identified unsafe actions (such as setting the temperature of the water running through a chiller to below 40 degrees) and unsafe states (such as a drop in ambient air temperature below 45 degrees).
- The authors trained the ensemble on a year’s worth of data from the chiller plant’s heuristic controller via reinforcement learning, penalizing actions depending on how much energy they consumed. Given an action, it learned to predict (i) the energy cost of that action and (ii) the plant’s resulting state 15 minutes later.
- For three months, they alternated between controlling the chiller plant using the ensemble for one day and the heuristic controller for one day. They recorded the actions and resulting states and added them to the training set. At the end of each day, they retrained the ensemble on the accumulated data. Alternating day by day made it possible to compare the performance of the ensemble and heuristic controller under similar conditions.
- During this period, the safety module blocked the system from taking actions that were known to be unsafe and actions the ensemble predicted to result in an unsafe state. Of the remaining actions, the ensemble predicted the one that would consume the least energy. In most cases, it took that action. Occasionally, it took a different action, so it could discover strategies that were more energy-efficient than those it learned from the heuristic controller.
Results: Alternating with the heuristics controller for three months in the two buildings, the authors’ method achieved energy savings of 9 percent and 13 percent, respectively, relative to the heuristic controller. Furthermore, the system made the chiller plants more efficient in interesting ways. For example, it learned to produce colder water, which consumed more energy up front but reduced the overall consumption.
Yes, but: The environment within the buildings varied over the three-month period with respect to factors like temperature and equipment performance. This left the authors unable to tell how much improvement to attribute to their system versus confounding factors.
Why it matters: Using reinforcement-learning algorithms to control expensive equipment requires significant domain expertise to account for variables like sensor calibration, maintenance schedules, and safety rules. Working closely with domain experts when applying such algorithms can maximize both efficiency and safety.
We’re thinking: Deep learning is cooler than ever!
This story first appeared in the September 27, 2023 edition of The Batch.