[Question] Using Reinforcement Learning to try to control the heating of a building (district heating)

In short, we are trying to use Reinforcement Learning to try to control the heating of a building (district heating) with the input buildings zone temperature, outdoor temperature. To not use the real building during training of the RL-algorithm we are using a building simulation program as an environment.

The building simulation program has inputs:

  • Zone thermostat heating and cooling setpoint (C)

  • Hot water pump flow rate.

Outputs from the building simulation program are:

  • Zone temperatures (C)

  • Outdoor temperature (C)

  • Hot water rate (kw)


The aim of the RL-algorithm is to make a more efficient control of the buildings district heating use, then the current district heating control function. The primary goal is to make the RL-algorithm peak-shave the district heating use.


We are using ClippedPPO as an agent using a RL-framework. As a comparison we have district heating data from one year from the building we want to control. The building is modelled in the building simulation format.

Action space of the RL-algorithm is:

  • Hot water pump flow rate

  • Zones heating and cooling temperature SP

Observation space of the RL-algorithm is:

  • Zone Air temperature

  • Outdoor temperature, current and forecast (36 hours into future)

  • Heating rate of hot water


In each timestep the RL-environment takes the input from the building simulation program and calculates a penalty from the observation state that is returned to the agent. The penalty is calculated as a sum of 4 different parts. Each part has a coefficient that by art I have been trying to figure out. Some of parts are for example the -coeff1*heating_rate^2, -coeff2*heating_derivative and -coeff3*unfomfortabletemp (large penalty when indoor temperature less than 19C)


The problem is that we are seeing heating with high peaks that we want the RL-algorithm to shave. So if anyone has any idea on how to get this working or give some insight on how to progress.

The orange part is the RL-resulting hot water heating rate and the blue part is the real-world measured values for 2022: