in practice, we seem to train the world model and understanding machine first and the policy only much later as a thin patch on top of the world model. this is not guaranteed to stay true but seems pretty durable so far. thus, the relevant heuristics are about base models not about randomly initialized neural networks.
separately, I do think randomly initialized neural networks have some strong baseline of fuzziness and conceptual corrigibility, which is in a sense what it means to have a traversible loss landscape.
in practice, we seem to train the world model and understanding machine first and the policy only much later as a thin patch on top of the world model. this is not guaranteed to stay true but seems pretty durable so far. thus, the relevant heuristics are about base models not about randomly initialized neural networks.
separately, I do think randomly initialized neural networks have some strong baseline of fuzziness and conceptual corrigibility, which is in a sense what it means to have a traversible loss landscape.