I like the reasoning behind this post, but I’m not sure I buy the conclusion. Here’s an attempt at excavating why not:
If I may try to paraphrase, I’d say your argument has two parts:
(1) Humans had a “sharp left turn” not because of some underlying jump in brain capabilities, but because of shifting from one way of gaining capabilities to another (from solo learning to culture).
(2) Contemporary AI training is more analogous to “already having culture,” so we shouldn’t expect that things will accelerate in ways ML researchers don’t already anticipate based on trend extrapolations.
Accordingly, we shouldn’t expect AIs to get a sharp left turn.
I think I buy (1) but I’m not sure about (2).
Here’s an attempt at arguing that AI training will still get a “boost from culture.” If I’m right, it could even be the case that their “boost from culture” will be larger than it was for early humans because we now have a massive culture overhang.
Or maybe “culture” isn’t the right thing exactly, and the better phrase is something like “generality-and-stacking-insights-on-top-of-each-other threshold from deep causal understanding.” If we look at human history, it’s not just the start of cultural evolution that stands out – it’s also the scientific revolution! (A lot of cultural evolution worked despite individual humans not understanding why they do the things that they do [Henrich’s “The Secret of our Success] – by contrast, science is different and requires at least some scientists to understand deeply what they’re doing.)
My intuition is that there’s an “intelligence” threshold past which all the information on the internet suddenly becomes a lot more useful. When Nate/MIRI speak of a “sharp left turn,” my guess is that they mean some understanding-driven thing. (And it has less to do with humans following unnecessarily convoluted rules about food preparation that they don’t even understand the purpose of, but following the rules somehow prevents them from poisoning themselves.) It’s not “culture” per se, but we needed culture to get there (and maybe it matters “what kind of culture” – e.g., education with scientific mindware).
Elsewhere, I expressed it as follows (quoting now from text I wrote elsewhere):
I suspect that there’s a phase transition that happens when agents get sufficiently good at what Daniel Kokotajlo and Ramana Kumar call “P₂B” (a recursive acronym for “Plan to P₂B Better”). When it comes to “intelligence,” it seems to me that we can distinguish between “learning potential” and “trained/crystallized intelligence” (or “competence”). Children who grow up in an enculturated/learning-friendly setting (as opposed to, e.g., feral children or Helen Keller before she met her teacher) reach a threshold where their understanding of the world and their thoughts becomes sufficiently deep to kickstart a feedback loop. Instead of aimlessly absorbing what’s around them, they prioritize learning the skills and habits of thinking that seem beneficial according to their goals. In this process, slight differences in “learning potential” can significantly affect where a person ends up in their intellectual prime. So, “learning potential” may be gradual, but above a specific threshold (humans above, chimpanzees below), there’s a discontinuity in how it translates to “trained/crystallized intelligence” after a lifetime of (self-)directed learning. Moreover, it seems that we can tell that the slope of the graph (y-axis: “trained/crystallized intelligence;” x-axis: “learning potential”) around the human range is steep.
To quote something I’ve written previously:
“If the child in the chair next to me in fifth grade was slightly more intellectually curious, somewhat more productive, and marginally better dispositioned to adopt a truth-seeking approach and self-image than I am, this could initially mean they score 100%, and I score 95% on fifth-grade tests – no big difference. But as time goes on, their productivity gets them to read more books, their intellectual curiosity and good judgment get them to read more unusually useful books, and their cleverness gets them to integrate all this knowledge in better and increasingly more creative ways. [...] By the time we graduate university, my intellectual skills are mostly useless, while they have technical expertise in several topics, can match or even exceed my thinking even on areas I specialized in, and get hired by some leading AI company.
[...]
If my 12-year-old self had been brain-uploaded to a suitable virtual reality, made copies of, and given the task of devouring the entire internet in 1,000 years of subjective time (with no aging) to acquire enough knowledge and skill to produce novel and for-the-world useful intellectual contributions, the result probably wouldn’t be much of a success. If we imagined the same with my 19-year-old self, there’s a high chance the result wouldn’t be useful either – but also some chance it would be extremely useful. [...] I think it’s at least plausible that there’s a jump once the copies reach a level of intellectual maturity to make plans which are flexible enough [...] and divide labor sensibly [...].”
In other words, I suspect there’s a discontinuity at the point where the P₂B feedback loop hits its critical threshold.
So, my intuition here is that we’ll see phase change once AIs reach the kind of deeper understanding of things that allows them to form better learning strategies. That phase transition will be similar in kind to going from no culture to culture, but it’s more “AIs suddenly grokking rationality/science to a sufficient-enough degree that they can stack insights with enough reliability to avoid deteriorating results.” (Once they grok it, the update permeates to everything they’ve read – since they read large parts of the internet, the result will be massive.)
I’m not sure what all this implies about values generalizing to new contexts / matters of alignment difficulty. You seem open to the idea of fast takeoff through AIs improving training data, which seems related to my notion of “AIs get smart enough to notice on their own what type of internet-text training data is highest quality vs what’s dumb or subtly off.” So, maybe we don’t disagree much and your objection to the “sharp left turn” concept has to do with the connotations it has for alignment difficulties.
Yeah but if this is the case, I’d have liked to see a bit more balance than just retweeting the tribal-affiliation slogan (“OpenAI is nothing without its people”) and saying that the board should resign (or, in Ilya’s case, implying that he regrets and denounces everything he initially stood for together with the board). Like, I think it’s a defensible take to think that the board should resign after how things went down, but the board was probably pointing to some real concerns that won’t get addressed at all if the pendulum now swings way too much in the opposite direction, so I would have at least hoped for something like “the board should resign, but here are some things that I think they had a point about, which I’d like to see to not get shrugged under the carpet after the counter-revolution.”