Ok so I have actually, after six months, got round to running some experiments here along the lines of the blegg/rube case. It uses a matching game where I generate k sets of N_nodes elements like {country, colour, number, animal, …}. I then train a model on some of the directed edges like country → animal, etc. and evaluate on some others.
Toy models and Pythia 70B can both learn to correctly match elements along unseen edges. (Weirdly Gemma 3 1B can’t do so when trained with LoRA, but Pythia 70B can, and I haven’t tried full Gemma 3 fine-tuning yet)
The specific blegg/rube case doesn’t work so well, but using random sub-graphs is pretty successful (Pythia can learn a 6-element matching game from 12⁄30 directed edges, toy models are a bit less stable). You can also see clusters forming in the residual stream corresponding to each set as abstractions form.
Might or might not get round to posting it, depending on whether I get a job soon.
I’d be excited to write it! If I do get a position I’m hopeful for, then I’ll see if I can write it up as part of the research there, and if I don’t get that position then I’ll be able to write it up in my free time I suppose. ¯\_(ツ)_/¯