jan betley comments on Localizing goal misgeneralization in a maze-solving policy network

jan betley 16 Oct 2023 18:52 UTC
1 point
0
Thanks! Indeed, shard theory fits here pretty well. I didn’t think about that while writing the post.