You’re seeing high peaks in the real observation data, or in the current simulations of the RL model?
My main worry would be that there is imprecision in the controls (set flow rate to X gpm, actually get more or less by an amount that isn’t predictable), and delays in impact (time from starting to heat to seeing the air temperature change) which your simulation is making too precise or differently from the real world.
There is a training phase (1-6 years of weather observations) where the RL-agent trains using the building simulation program. Then I evaluate on 2022 weather data using the same building simulation program with the agent I previously trained. The graph contains real measured values from 2022 (blue line) of the building. The agent is evaluated on the weather data from year 2022.
Yes. The building has a certain inertia, this is something I hope the agent want to learn as well. The 36 hours outdoors temperature forecast is supplied in the observation state so that the agent knows it should preheat the building when forecast temperature is going down to lower the heating peak penalty.
I’d expect building inertia (and heater inertia, and water-flow and -temperature inertia) are important to both pre-heating effectively, and to smoothing out any spikes. The other factor in the spikes is probably the cost function—are you modeling constraints like minimum time to heat and rapid-cycling maintenance increase?
You’re seeing high peaks in the real observation data, or in the current simulations of the RL model?
My main worry would be that there is imprecision in the controls (set flow rate to X gpm, actually get more or less by an amount that isn’t predictable), and delays in impact (time from starting to heat to seeing the air temperature change) which your simulation is making too precise or differently from the real world.
There is a training phase (1-6 years of weather observations) where the RL-agent trains using the building simulation program. Then I evaluate on 2022 weather data using the same building simulation program with the agent I previously trained. The graph contains real measured values from 2022 (blue line) of the building. The agent is evaluated on the weather data from year 2022.
Yes. The building has a certain inertia, this is something I hope the agent want to learn as well. The 36 hours outdoors temperature forecast is supplied in the observation state so that the agent knows it should preheat the building when forecast temperature is going down to lower the heating peak penalty.
I’d expect building inertia (and heater inertia, and water-flow and -temperature inertia) are important to both pre-heating effectively, and to smoothing out any spikes. The other factor in the spikes is probably the cost function—are you modeling constraints like minimum time to heat and rapid-cycling maintenance increase?