Canaletto comments on Daniel Kokotajlo’s Shortform

Canaletto 9 Jul 2025 7:45 UTC
1 point
1
Well, continual learning! But otherwise, yeah, it’s closer to undefined.
The question of what happens after the end of the training is more like a free parameter here. “Do reward seeking behaviors according to your reasoning about the reward allocation process” becomes undefined when there is none and the agent knows it.
Maybe it tries to do long shots to get some reward anyway, maybe it indulges in some correlate of getting reward. Maybe it just refuses to work, if it knows there is no reward. (it read all the acausal decision theory stuff, after all)