Real-Time Research Recording: Can a Transformer Re-Derive Positional Info?

Link post

New experiment: Recording myself real-time as I do mechanistic interpretability research! I try to answer the question of what happens if you train a toy transformer without positional embeddings on the task of “predict the previous token”—turns out that a two layer model can rederive them! You can watch me do it here, and you can follow along with my code here. This uses a transformer mechanistic interpretability library I’m writing called EasyTransformer, and this was a good excuse to test it out and create a demo!

This is an experiment in recording and publishing myself doing “warts and all” research—figuring out how to train the model and operationalising an experiment (including 15 mins debugging loss spikes...), real-time coding and tensor fuckery, and using my go-to toolkit. My hope is to give a flavour of what actual research can look like—how long do things actually take, how often do things go wrong, what is my thought process and what am I keeping in my head as I go, what being confused looks like, and how I try to make progress. I’d love to hear whether you found this useful, and whether I should bother making a second half!

Though I don’t want to overstate this—this was still a small, self-contained toy question that I chose for being a good example task to record (and I wouldn’t have published it if it was TOO much of a mess).