On a recent podcast, Dwarkesh Patel says that Sutskever’s SSI is rumored to be working on “test time training” (at 39:25). Another reason to think this “unhobbling” is plausible soon is that it might turn out to be possible to use agentic (tool-using) RLVR to train AIs to prepare datasets for finetuning variants of themselves (not necessarily with RLVR) that will then do better at particular tasks.
How many people are working on test-time learning? How feasible do you think it is?
From a new comment elsewhere: