Mateusz Bagiński comments on Basic Facts about Language Model Internals

Mateusz Bagiński 5 Jan 2023 6:46 UTC
4 points
2
It would be interesting to see how these change throughout training. AFAIK GPT-2s do not have saved checkpoints, but eg Pythia does and has an even broader range of parameter sizes than GPT-2s.