Arthur Conmy comments on Interpretability

Arthur Conmy 16 Jun 2022 15:30 UTC
−1 points
0
In the AlphaZero interpretability paper [1], CTRL+F “Ruy Lopez” for an example where the model’s progress was much faster than human progress in quality.
[1] https://arxiv.org/pdf/2111.09259.pdf
- gwern 16 Jun 2022 16:17 UTC
  2 points
  0
  Parent
  That’s within-training by epoch/iteration, not across trained models by total size/compute. It’s not clear that they are at all the same sort of thing, because you can get spikes trivially by things like the learning rate dropping. Investigating whether there is any connection would be interesting.