My thought was that if lookahead improves performance during some period of the training, it’s liable to develop mesa-optimization during that period, and then find it to be a useful for other things later on.
My thought was that if lookahead improves performance during some period of the training, it’s liable to develop mesa-optimization during that period, and then find it to be a useful for other things later on.