I’ll try to make this clearer if I turn it into a more serious top-level post. My intent here was to just push this out since it’s been bothering me, but I have other things to do.
TLDR: Lots of researchers seem to be banking on the idea that LLMs are generalizing OOD or that scale will just solve this (whether through scale alone or scale + using the scaled model to come up with a research breakthrough that does). Lots of research and funding seem to hinge on this idea, which, imo, is underappreciated. If taken seriously, it may mean that 1) timelines are longer, 2) we should expect fundamental reshaping of AI cognition due to the LLM inability to generalize OOD, 3) we shouldn’t update much on alignment progress based on current safety research.
I shared this in the post, but more thoughts here.
This post by @Hyperion describes another natural consequence of the above with respect to RSI (that the field seems to be understating):
Some takes about RSI from discussions with many smart researchers & thinkers:
1. Many RSI (or automated AI R&D) debates converge to similar cruxes: is a 1000x sample efficiency improvement possible, can you just simulate reality and train on it with no sim2real gap, can we easily make models good at “fuzzy” tasks? People like to assume that automated research agents will find such breakthroughs specifically *because* without them, progress could be heavily bottlenecked on data or continued compute scale-ups.
2. The Yudkowsky “genius brain in a box” framing of ASI has latent influence on many researcher views even though people may not be aware of it. A common move is to “flip” predictions, as they go further out, from assuming LLM or deep learning-specific properties of future AI to assuming “von Neumann x1000″, human brain-like properties. I’d like to see more thought-out reasoning of why this flip should occur at any particular point (eg pre or post automated AI R&D)—this question is a crux behind many predictions like AI 2027.
3. There are some cracks in this worldview beginning to show: predictions from a few years ago that models would be less jagged now than they are, or that they would be more deceptive, synthetic data would work better, etc. Many of these seem like prediction errors from imagining future models as a “human brain in a box”, but LLMs are empirically a different kind of intelligence. Most models of software-only intelligence explosion are also coarse enough to mostly ignore properties of LLMs.
4. Views about fast RSI progress seem to be correlated with (a) belief that synthetic data is all you need (b) belief in very high GDP growth and an industrial explosion because of automated firms (c) having worked only in AI research or in small organizations.
5. Key technical things to track over the next 1-2 years: does RL increase in its generalization, AI lab data spend, can we automate synthetic RL env construction, best practices for FDEs deploying AI into large enterprises, coherency of AI personas, how powerful will multi-agent scaling of test-time compute be, and continual learning.
6. Overall I think the “RSI leading to *fast* takeoff” frame had huge alpha in 2022, moderate in 2024, and potentially is of neutral usefulness in 2026 for predicting the future.
What is the thesis here? I’ve read this through and I don’t get what the point you’re trying to get across is.
I’ll try to make this clearer if I turn it into a more serious top-level post. My intent here was to just push this out since it’s been bothering me, but I have other things to do.
TLDR: Lots of researchers seem to be banking on the idea that LLMs are generalizing OOD or that scale will just solve this (whether through scale alone or scale + using the scaled model to come up with a research breakthrough that does). Lots of research and funding seem to hinge on this idea, which, imo, is underappreciated. If taken seriously, it may mean that 1) timelines are longer, 2) we should expect fundamental reshaping of AI cognition due to the LLM inability to generalize OOD, 3) we shouldn’t update much on alignment progress based on current safety research.
I shared this in the post, but more thoughts here.
This post by @Hyperion describes another natural consequence of the above with respect to RSI (that the field seems to be understating):