Interpreting a Maze-Solving Network

Mechanistic interpretability on a pretrained policy network from Goal Misgeneralization in Deep Reinforcement Learning.