Thank you for the comment Saul—I agree with a lot of your points, in particular that “explosive” periods are costly and inefficient (relative perhaps to some ideal), and that they are not in and of themselves a solution for long-term retention.
I expect if we have a crux it’s whether someone who intends to follow an incremental path vs someone who does an intense acquisition period is more likely to, ~ a year later, actually have the skill. And my guess is, for a number of reasons, it’s the later; I’d expect a lot of incrementalists to ‘just not actually do the thing’.
* My ideal strategy would be “explore lightly a number of things, to determine what you want → explode towards that for an intense period of time → establish incremental practices to maintain and improve”
* Your comment also highlighted for me, something that I had cut from the initial draft, my belief that explosive periods help overcome emotional blockers, which I think might be a big part of why people shy away from skills they say they want.
In Jack Clark’s Import AI 439, he references a new paper Universally Converging Representations of Matter Across Scientific Foundation Models
> Do AI systems end up finding similar ways to represent the world to themselves? Yes, as they get smarter and more capable, they arrive at a common set of ways of representing the world.
> The latest evidence for this is research from MIT which shows that this is true for scientific models and the modalities they’re trained on: “representations learned by nearly sixty scientific models, spanning string-, graph-, 3D atomistic, and protein-based modalities, are highly aligned across a wide range of chemical systems,” they write. “Models trained on different datasets have highly similar representations of small molecules, and machine learning interatomic potentials converge in representation space as they improve in performance, suggesting that foundation models learn a common underlying representation of physical reality.”
> As with other studies of representation, they found that as you scale the data and compute models are trained on, “their representations converge further”.
This seems like useful empirical evidence for the Natural Abstraction Hypothesis (I haven’t been following progress on that research agenda, so I don’t know how significant of an update this is)