I like the reasoning behind this post, but I’m not sure I buy the conclusion. Here’s an attempt at excavating why not:
If I may try to paraphrase, I’d say your argument has two parts:
(1) Humans had a “sharp left turn” not because of some underlying jump in brain capabilities, but because of shifting from one way of gaining capabilities to another (from solo learning to culture).
(2) Contemporary AI training is more analogous to “already having culture,” so we shouldn’t expect that things will accelerate in ways ML researchers don’t already anticipate based on trend extrapolations.
Accordingly, we shouldn’t expect AIs to get a sharp left turn.
I think I buy (1) but I’m not sure about (2).
Here’s an attempt at arguing that AI training will still get a “boost from culture.” If I’m right, it could even be the case that their “boost from culture” will be larger than it was for early humans because we now have a massive culture overhang.
Or maybe “culture” isn’t the right thing exactly, and the better phrase is something like “generality-and-stacking-insights-on-top-of-each-other threshold from deep causal understanding.” If we look at human history, it’s not just the start of cultural evolution that stands out – it’s also the scientific revolution! (A lot of cultural evolution worked despite individual humans not understanding why they do the things that they do [Henrich’s “The Secret of our Success] – by contrast, science is different and requires at least some scientists to understand deeply what they’re doing.)
My intuition is that there’s an “intelligence” threshold past which all the information on the internet suddenly becomes a lot more useful. When Nate/MIRI speak of a “sharp left turn,” my guess is that they mean some understanding-driven thing. (And it has less to do with humans following unnecessarily convoluted rules about food preparation that they don’t even understand the purpose of, but following the rules somehow prevents them from poisoning themselves.) It’s not “culture” per se, but we needed culture to get there (and maybe it matters “what kind of culture” – e.g., education with scientific mindware).
Elsewhere, I expressed it as follows (quoting now from text I wrote elsewhere):
I suspect that there’s a phase transition that happens when agents get sufficiently good at what Daniel Kokotajlo and Ramana Kumar call “P₂B” (a recursive acronym for “Plan to P₂B Better”). When it comes to “intelligence,” it seems to me that we can distinguish between “learning potential” and “trained/crystallized intelligence” (or “competence”). Children who grow up in an enculturated/learning-friendly setting (as opposed to, e.g., feral children or Helen Keller before she met her teacher) reach a threshold where their understanding of the world and their thoughts becomes sufficiently deep to kickstart a feedback loop. Instead of aimlessly absorbing what’s around them, they prioritize learning the skills and habits of thinking that seem beneficial according to their goals. In this process, slight differences in “learning potential” can significantly affect where a person ends up in their intellectual prime. So, “learning potential” may be gradual, but above a specific threshold (humans above, chimpanzees below), there’s a discontinuity in how it translates to “trained/crystallized intelligence” after a lifetime of (self-)directed learning. Moreover, it seems that we can tell that the slope of the graph (y-axis: “trained/crystallized intelligence;” x-axis: “learning potential”) around the human range is steep.
To quote something I’ve written previously:
“If the child in the chair next to me in fifth grade was slightly more intellectually curious, somewhat more productive, and marginally better dispositioned to adopt a truth-seeking approach and self-image than I am, this could initially mean they score 100%, and I score 95% on fifth-grade tests – no big difference. But as time goes on, their productivity gets them to read more books, their intellectual curiosity and good judgment get them to read more unusually useful books, and their cleverness gets them to integrate all this knowledge in better and increasingly more creative ways. [...] By the time we graduate university, my intellectual skills are mostly useless, while they have technical expertise in several topics, can match or even exceed my thinking even on areas I specialized in, and get hired by some leading AI company.
[...]
If my 12-year-old self had been brain-uploaded to a suitable virtual reality, made copies of, and given the task of devouring the entire internet in 1,000 years of subjective time (with no aging) to acquire enough knowledge and skill to produce novel and for-the-world useful intellectual contributions, the result probably wouldn’t be much of a success. If we imagined the same with my 19-year-old self, there’s a high chance the result wouldn’t be useful either – but also some chance it would be extremely useful. [...] I think it’s at least plausible that there’s a jump once the copies reach a level of intellectual maturity to make plans which are flexible enough [...] and divide labor sensibly [...].”
In other words, I suspect there’s a discontinuity at the point where the P₂B feedback loop hits its critical threshold.
So, my intuition here is that we’ll see phase change once AIs reach the kind of deeper understanding of things that allows them to form better learning strategies. That phase transition will be similar in kind to going from no culture to culture, but it’s more “AIs suddenly grokking rationality/science to a sufficient-enough degree that they can stack insights with enough reliability to avoid deteriorating results.” (Once they grok it, the update permeates to everything they’ve read – since they read large parts of the internet, the result will be massive.)
I’m not sure what all this implies about values generalizing to new contexts / matters of alignment difficulty. You seem open to the idea of fast takeoff through AIs improving training data, which seems related to my notion of “AIs get smart enough to notice on their own what type of internet-text training data is highest quality vs what’s dumb or subtly off.” So, maybe we don’t disagree much and your objection to the “sharp left turn” concept has to do with the connotations it has for alignment difficulties.
Interesting, this seems quite similar to the idea that human intelligence is around some critical threshold for scientific understanding and reasoning. I’m skeptical that it’s useful to think of this as “culture” (except insofar as AIs hearing about the scientific method and mindset from training data, which will apply to anything trained on common crawl) but the broader point does seem to be a major factor in whether there is a “sharp left turn”.
I don’t think AIs acquiring scientific understanding and reasoning is really a crux for a sharp left turn: moderately intelligent humans who understand the importance of scientific understanding and reasoning and are actively trying to use it seem (to me) to be able to use it very well when biases aren’t getting in the way. Very high-g humans can outsmart them trivially in some domains (like pure math) and to a limited extent in others (like social manipulation). But even if you would describe these capability gains as dramatic it doesn’t seem like you can attribute them to greater awareness of abstract and scientific reasoning. Unless you think AI that’s only barely able to grok these concepts might be an existential threat or there are further levels of understanding beyond what very smart humans have I don’t think there’s a good reason to be worried about a jump to superhuman capabilities due to gaining capabilities like P₂B.
On the other hand you have a concern that AIs would be able to figure out what from their training data is high quality or low quality (and presumably manually adjusting their own weights to remove the low quality training data). I agree with you that this is a potential cause for a discontinuity, though if foundation model developers start using their previous AIs to pre-filter (or generate) the training data in this way then I think we shouldn’t see a big discontinuity due to the training data.
Like you I am not sure what implications there are for whether a sharp gain in capability would likely be accompanied by a major change in alignment (compared to if that capability had happened gradually).
I like the reasoning behind this post, but I’m not sure I buy the conclusion. Here’s an attempt at excavating why not:
If I may try to paraphrase, I’d say your argument has two parts:
(1) Humans had a “sharp left turn” not because of some underlying jump in brain capabilities, but because of shifting from one way of gaining capabilities to another (from solo learning to culture).
(2) Contemporary AI training is more analogous to “already having culture,” so we shouldn’t expect that things will accelerate in ways ML researchers don’t already anticipate based on trend extrapolations.
Accordingly, we shouldn’t expect AIs to get a sharp left turn.
I think I buy (1) but I’m not sure about (2).
Here’s an attempt at arguing that AI training will still get a “boost from culture.” If I’m right, it could even be the case that their “boost from culture” will be larger than it was for early humans because we now have a massive culture overhang.
Or maybe “culture” isn’t the right thing exactly, and the better phrase is something like “generality-and-stacking-insights-on-top-of-each-other threshold from deep causal understanding.” If we look at human history, it’s not just the start of cultural evolution that stands out – it’s also the scientific revolution! (A lot of cultural evolution worked despite individual humans not understanding why they do the things that they do [Henrich’s “The Secret of our Success] – by contrast, science is different and requires at least some scientists to understand deeply what they’re doing.)
My intuition is that there’s an “intelligence” threshold past which all the information on the internet suddenly becomes a lot more useful. When Nate/MIRI speak of a “sharp left turn,” my guess is that they mean some understanding-driven thing. (And it has less to do with humans following unnecessarily convoluted rules about food preparation that they don’t even understand the purpose of, but following the rules somehow prevents them from poisoning themselves.) It’s not “culture” per se, but we needed culture to get there (and maybe it matters “what kind of culture” – e.g., education with scientific mindware).
Elsewhere, I expressed it as follows (quoting now from text I wrote elsewhere):
So, my intuition here is that we’ll see phase change once AIs reach the kind of deeper understanding of things that allows them to form better learning strategies. That phase transition will be similar in kind to going from no culture to culture, but it’s more “AIs suddenly grokking rationality/science to a sufficient-enough degree that they can stack insights with enough reliability to avoid deteriorating results.” (Once they grok it, the update permeates to everything they’ve read – since they read large parts of the internet, the result will be massive.)
I’m not sure what all this implies about values generalizing to new contexts / matters of alignment difficulty. You seem open to the idea of fast takeoff through AIs improving training data, which seems related to my notion of “AIs get smart enough to notice on their own what type of internet-text training data is highest quality vs what’s dumb or subtly off.” So, maybe we don’t disagree much and your objection to the “sharp left turn” concept has to do with the connotations it has for alignment difficulties.
Interesting, this seems quite similar to the idea that human intelligence is around some critical threshold for scientific understanding and reasoning. I’m skeptical that it’s useful to think of this as “culture” (except insofar as AIs hearing about the scientific method and mindset from training data, which will apply to anything trained on common crawl) but the broader point does seem to be a major factor in whether there is a “sharp left turn”.
I don’t think AIs acquiring scientific understanding and reasoning is really a crux for a sharp left turn: moderately intelligent humans who understand the importance of scientific understanding and reasoning and are actively trying to use it seem (to me) to be able to use it very well when biases aren’t getting in the way. Very high-g humans can outsmart them trivially in some domains (like pure math) and to a limited extent in others (like social manipulation). But even if you would describe these capability gains as dramatic it doesn’t seem like you can attribute them to greater awareness of abstract and scientific reasoning. Unless you think AI that’s only barely able to grok these concepts might be an existential threat or there are further levels of understanding beyond what very smart humans have I don’t think there’s a good reason to be worried about a jump to superhuman capabilities due to gaining capabilities like P₂B.
On the other hand you have a concern that AIs would be able to figure out what from their training data is high quality or low quality (and presumably manually adjusting their own weights to remove the low quality training data). I agree with you that this is a potential cause for a discontinuity, though if foundation model developers start using their previous AIs to pre-filter (or generate) the training data in this way then I think we shouldn’t see a big discontinuity due to the training data.
Like you I am not sure what implications there are for whether a sharp gain in capability would likely be accompanied by a major change in alignment (compared to if that capability had happened gradually).