I like this post but I think it misses / barely covers two of the most important cases for optimism.
1. Detail of specification
Frontier LLMs have a very good understanding of humans, and seem to model them as well as or even better than other humans. I recall seeing repeated reports of Claude understanding its interlocutor faster than they thought was possible, as if it just “gets” them, e.g. from one Reddit thread I quickly found:
“sometimes, when i’m tired, i type some lousy prompts, full of typos, incomplete info etc, but Claude still gets me, on a deep fucking level”
“The ability of how Claude AI capture your intentions behind your questions is truly remarkable. Sometimes perhaps you’re being vague or something, but it will still get you.”
“even with new chats, it still fills in the gaps and understands my intention”
LLMs have presumably been trained on:
millions of anecdotes from the internet, including how the author felt, other users’ reactions and commentary, etc.
case law: how did humans chosen for their wisdom (judges) determine what was right and wrong
thousands of philosophy books
Lesswrong / Alignment Forum, with extensive debate on what would be right and wrong for AIs to do
There are also techniques like deliberative alignment, which includes an explicit specification for how AIs should behave. I don’t think the model spec is currently detailed enough but I assume OpenAI intend to actively update it.
Compare this to the “specification” humans are given by your Ev character: some basic desires for food, comfort, etc. Our desires are very crude, confusing, and inconsistent; and only very roughly correlate with IGF. It’s hard to emphasize enough how much more detailed is the specification that we present to AI models.
2. (Somewhat) Gradual Scaling
Toby Ord estimates that pretraining “compute required scales as the 20th power of the desired accuracy”. He estimates that inference scaling is even more expensive, requiring exponentially more compute just to make constant progress. Both of these trends suggest that, even with large investments, performance will increase slowly from hardware alone (this relies on the assumption that hardware performance / $ is increasing slowly, which seems empirically justified). Progress could be faster if big algorithmic improvements are found. In particular I want to call out that recursive-self improvement (especially without a human in the loop) could blow up this argument (which is why I wish it was banned). Still, I’m overall optimistic that capabilities will scale fairly smoothly / predictably.
With (1) and (2) combined, we’re able to gain some experience with each successive generation of models, and add anything we find is missing from the training dataset / model spec, without taking any leaps that are too big / dangerous. I don’t want to suggest that the scaling up while maintaining alignment process will definitely succeed, just that we should update towards the optimistic view based on these arguments.
For (2), I’m gonna uncharitably rephrase your point as saying: “There hasn’t been a sharp left turn yet, and therefore I’m overall optimistic there will never be a sharp left turn in the future.” Right?
I’m not really sure how to respond to that … I feel like you’re disagreeing with one of the main arguments of this post without engaging it. Umm, see §1. One key part is §1.5:
I do make the weaker claim that, as of this writing, publicly-available AI models do not have the full (1-3) triad—generation, selection, and open-ended accumulation—to any significant degree. Specifically, foundation models are not currently set up to do the “selection” in a way that “accumulates”. For example, at an individual level, if a human realizes that something doesn’t make sense, they can and will alter their permanent knowledge store to excise that belief. Likewise, at a group level, in a healthy human scientific community, the latest textbooks delete the ideas that have turned out to be wrong, and the next generation of scientists learns from those now-improved textbooks. But for currently-available foundation models, I don’t think there’s anything analogous to that. The accumulation can only happen within a context window (which is IMO far more limited than weight updates), and also within pre- and post-training (which are in some ways anchored to existing human knowledge; see discussion of o1 in §1.1 above).
…And then §3.7:
Back to AGI, if you agree with me that today’s already-released AIs don’t have the full (1-3) triad to any appreciable degree [as I argued in §1.5], and that future AI algorithms or training approaches will, then there’s going to be a transition between here and there. And this transition might look like someone running a new training run, from random initialization, with a better learning algorithm or training approach than before. While the previous training runs create AIs along the lines that we’re used to, maybe the new one would be like (as gwern said) “watching the AlphaGo Elo curves: it just keeps going up… and up… and up…”. Or, of course, it might be more gradual than literally a single run with a better setup. Hard to say for sure. My money would be on “more gradual than literally a single run”, but my cynical expectation is that the (maybe a couple years of) transition time will be squandered, for various reasons in §3.3 here.
I do expect that there will be a future AI advance that opens up full-fledged (1-3) triad in any domain, from math-without-proof-assistants, to economics, to philosophy, and everything else. After all, that’s what happened in humans. Like I said in §1.1, our human discernment, (a.k.a. (2B)) is a flexible system that can declare that ideas do or don’t hang together and make sense, regardless of its domain.
This post is agnostic over whether the sharp left turn will be a big algorithmic advance (akin to switching from MuZero to LLMs, for example), versus a smaller training setup change (akin to o1 using RL in a different way than previous LLMs, for example). [I have opinions, but they’re out-of-scope.] A third option is “just scaling the popular LLM training techniques that are already in widespread use as of this writing”, but I don’t personally see how that option would lead to the (1-3) triad, for reasons in the excerpt above. (This is related to my expectation that LLM training techniques in widespread use as of this writing will not scale to AGI … which should not be a crazy hypothesis, given that LLM training techniques were different as recently as ≈6 months ago!) But even if you disagree, it still doesn’t really matter for this post. I’m focusing on the existence of the sharp left turn and its consequences, not what future programmers will do to precipitate it.
~~
For (1), I did mention that we can hope to do better than Ev (see §5.1.3), but I still feel like you didn’t even understand the major concern that I was trying to bring up in this post. Excerpting again:
The optimistic “alignment generalizes farther” argument is saying: if the AI is robustly motivated to be obedient (or helpful, or harmless, or whatever), then that motivation can guide its actions in a rather wide variety of situations.
The pessimistic “capabilities generalize farther” counterargument is saying: hang on, is the AI robustly motivated to be obedient? Or is it motivated to be obedient in a way that is not resilient to the wrenching distribution shifts that we get when the AI has the (1-3) triad (§1.3 above) looping around and around, repeatedly changing its ontology, ideas, and available options?
Again, the big claim of this post is that the sharp left turn has not happened yet. We can and should argue about whether we should feel optimistic or pessimistic about those “wrenching distribution shifts”, but those arguments are as yet untested, i.e. they cannot be resolved by observing today’s pre-sharp-left-turn LLMs. See what I mean?
For (2), I’m gonna uncharitably rephrase your point as saying: “There hasn’t been a sharp left turn yet, and therefore I’m overall optimistic there will never be a sharp left turn in the future.” Right?
Hm, I wouldn’t have phrased it that way. Point (2) says nothing about the probability of there being a “left turn”, just the speed at which it would happen. When I hear “sharp left turn”, I picture something getting out of control overnight, so it’s useful to contextualize how much compute you have to put in to get performance out, since this suggests that (inasmuch as it’s driven by compute) capabilities ought to grow gradually.
I feel like you’re disagreeing with one of the main arguments of this post without engaging it.
I didn’t mean to disagree with anything in your post, just to add a couple points which I didn’t think were addressed.
You’re right that point (2) wasn’t engaging with the (1-3) triad, because it wasn’t mean to. It’s only about the rate of growth of capabilities (which is important because if each subsequent model is only 10% more capable than the one which came before then there’s good reason to think that alignment techniques which work well on current models will also work on subsequent models).
Again, the big claim of this post is that the sharp left turn has not happened yet. We can and should argue about whether we should feel optimistic or pessimistic about those “wrenching distribution shifts”, but those arguments are as yet untested, i.e. they cannot be resolved by observing today’s pre-sharp-left-turn LLMs. See what I mean?
I do see, and I think this gets at the difference in our (world) models. In a world where there’s a real discontinuity, you’re right, you can’t say much about a post-sharp-turn LLM. In a world where there’s continuous progress, like I mentioned above, I’d be surprised if a “left turn” suddenly appeared without any warning.
Thanks! I still feel like you’re missing my point, let me try again, thanks for being my guinea pig as I try to get to the bottom of it. :)
inasmuch as it’s driven by compute
In terms of the “genome = ML code” analogy (§3.1), humans today have the same compute as humans 100,000 years ago. But humans today have dramatically more capabilities—we have invented the scientific method and math and biology and nuclear weapons and condoms and Fortnite and so on, and we did all that, all by ourselves, autonomously, from scratch. There was no intelligent external non-human entity who was providing humans with bigger brains or new training data or new training setups or new inference setups or anything else.
If you look at AI today, it’s very different from that. LLMs today work better than LLMs from six months ago, but only because there was an intelligent external entity, namely humans, who was providing the LLM with more layers, new training data, new training setups, new inference setups, etc.
…And if you’re now thinking “ohhh, OK, Steve is just talking about AI doing AI research, like recursive self-improvement, yeah duh, I already mentioned that in my first comment” … then you’re still misunderstanding me!
Again, think of the “genome = ML code” analogy (§3.1). In that analogy,
“AIs building better AIs by doing the exact same kinds of stuff that human researchers are doing today to build better AIs”
…would be analogous to…
“Early humans creating more intelligent descendants by doing biotech or selective breeding or experimentally-optimized child-rearing or whatever”.
But humans didn’t do that. We still have basically the same brains as our ancestors 100,000 years ago. And yet humans were still able to dramatically autonomously improve their capabilities, compared to 100,000 years ago. We were making stone tools back then, we’re making nuclear weapons now.
Thus, autonomous learning is a different axis of AI capabilities improvement. It’s unrelated to scaling, and it’s unrelated to “automated AI capabilities research” (as typically envisioned by people in the LLM-sphere). And “sharp left turn” is what I’m calling the transition from “no open-ended autonomous learning” (i.e., the status quo) to “yes open-ended autonomous learning” (i.e., sometime in the future). It’s a future transition, and it has profound implications, and it hasn’t even started (§1.5). It doesn’t have to happen overnight—see §3.7. See what I mean?
Thanks for your patience: I do think this message makes your point clearly. However, I’m sorry to say, I still don’t think I was missing the point. I reviewed §1.5, still believe I understand the open-ended autonomous learning distribution shift, and also find it scary. I also reviewed §3.7, and found it to basically match my model, especially this bit:
Or, of course, it might be more gradual than literally a single run with a better setup. Hard to say for sure. My money would be on “more gradual than literally a single run”, but my cynical expectation is that the (maybe a couple years of) transition time will be squandered
Overall, I don’t have the impression we disagree too much. My guess for what’s going on (and it’s my fault) is that my initial comment’s focus on scaling was not a reaction to anything you said in your post, in fact you didn’t say much about scaling at all. It was more a response to the scaling discussion I see elsewhere.
Not OP, but for me, it comes down to LLMs correctly interpreting the intent behind my questions/requests. In other words, I don’t need to be hyper specific in my prompts in order to get the results I want.
That makes sense and is well pretty obvious. Why isn’t claude getting me tho and he is getting other people? It’s hard for me to even explain claude what kind of code he should write. It is just a skill issue? Can someone teach me how to prompt claude?
I like this post but I think it misses / barely covers two of the most important cases for optimism.
1. Detail of specification
Frontier LLMs have a very good understanding of humans, and seem to model them as well as or even better than other humans. I recall seeing repeated reports of Claude understanding its interlocutor faster than they thought was possible, as if it just “gets” them, e.g. from one Reddit thread I quickly found:
“sometimes, when i’m tired, i type some lousy prompts, full of typos, incomplete info etc, but Claude still gets me, on a deep fucking level”
“The ability of how Claude AI capture your intentions behind your questions is truly remarkable. Sometimes perhaps you’re being vague or something, but it will still get you.”
“even with new chats, it still fills in the gaps and understands my intention”
LLMs have presumably been trained on:
millions of anecdotes from the internet, including how the author felt, other users’ reactions and commentary, etc.
case law: how did humans chosen for their wisdom (judges) determine what was right and wrong
thousands of philosophy books
Lesswrong / Alignment Forum, with extensive debate on what would be right and wrong for AIs to do
There are also techniques like deliberative alignment, which includes an explicit specification for how AIs should behave. I don’t think the model spec is currently detailed enough but I assume OpenAI intend to actively update it.
Compare this to the “specification” humans are given by your Ev character: some basic desires for food, comfort, etc. Our desires are very crude, confusing, and inconsistent; and only very roughly correlate with IGF. It’s hard to emphasize enough how much more detailed is the specification that we present to AI models.
2. (Somewhat) Gradual Scaling
Toby Ord estimates that pretraining “compute required scales as the 20th power of the desired accuracy”. He estimates that inference scaling is even more expensive, requiring exponentially more compute just to make constant progress. Both of these trends suggest that, even with large investments, performance will increase slowly from hardware alone (this relies on the assumption that hardware performance / $ is increasing slowly, which seems empirically justified). Progress could be faster if big algorithmic improvements are found. In particular I want to call out that recursive-self improvement (especially without a human in the loop) could blow up this argument (which is why I wish it was banned). Still, I’m overall optimistic that capabilities will scale fairly smoothly / predictably.
With (1) and (2) combined, we’re able to gain some experience with each successive generation of models, and add anything we find is missing from the training dataset / model spec, without taking any leaps that are too big / dangerous. I don’t want to suggest that the scaling up while maintaining alignment process will definitely succeed, just that we should update towards the optimistic view based on these arguments.
For (2), I’m gonna uncharitably rephrase your point as saying: “There hasn’t been a sharp left turn yet, and therefore I’m overall optimistic there will never be a sharp left turn in the future.” Right?
I’m not really sure how to respond to that … I feel like you’re disagreeing with one of the main arguments of this post without engaging it. Umm, see §1. One key part is §1.5:
…And then §3.7:
This post is agnostic over whether the sharp left turn will be a big algorithmic advance (akin to switching from MuZero to LLMs, for example), versus a smaller training setup change (akin to o1 using RL in a different way than previous LLMs, for example). [I have opinions, but they’re out-of-scope.] A third option is “just scaling the popular LLM training techniques that are already in widespread use as of this writing”, but I don’t personally see how that option would lead to the (1-3) triad, for reasons in the excerpt above. (This is related to my expectation that LLM training techniques in widespread use as of this writing will not scale to AGI … which should not be a crazy hypothesis, given that LLM training techniques were different as recently as ≈6 months ago!) But even if you disagree, it still doesn’t really matter for this post. I’m focusing on the existence of the sharp left turn and its consequences, not what future programmers will do to precipitate it.
~~
For (1), I did mention that we can hope to do better than Ev (see §5.1.3), but I still feel like you didn’t even understand the major concern that I was trying to bring up in this post. Excerpting again:
Again, the big claim of this post is that the sharp left turn has not happened yet. We can and should argue about whether we should feel optimistic or pessimistic about those “wrenching distribution shifts”, but those arguments are as yet untested, i.e. they cannot be resolved by observing today’s pre-sharp-left-turn LLMs. See what I mean?
Hm, I wouldn’t have phrased it that way. Point (2) says nothing about the probability of there being a “left turn”, just the speed at which it would happen. When I hear “sharp left turn”, I picture something getting out of control overnight, so it’s useful to contextualize how much compute you have to put in to get performance out, since this suggests that (inasmuch as it’s driven by compute) capabilities ought to grow gradually.
I didn’t mean to disagree with anything in your post, just to add a couple points which I didn’t think were addressed.
You’re right that point (2) wasn’t engaging with the (1-3) triad, because it wasn’t mean to. It’s only about the rate of growth of capabilities (which is important because if each subsequent model is only 10% more capable than the one which came before then there’s good reason to think that alignment techniques which work well on current models will also work on subsequent models).
I do see, and I think this gets at the difference in our (world) models. In a world where there’s a real discontinuity, you’re right, you can’t say much about a post-sharp-turn LLM. In a world where there’s continuous progress, like I mentioned above, I’d be surprised if a “left turn” suddenly appeared without any warning.
Thanks! I still feel like you’re missing my point, let me try again, thanks for being my guinea pig as I try to get to the bottom of it. :)
In terms of the “genome = ML code” analogy (§3.1), humans today have the same compute as humans 100,000 years ago. But humans today have dramatically more capabilities—we have invented the scientific method and math and biology and nuclear weapons and condoms and Fortnite and so on, and we did all that, all by ourselves, autonomously, from scratch. There was no intelligent external non-human entity who was providing humans with bigger brains or new training data or new training setups or new inference setups or anything else.
If you look at AI today, it’s very different from that. LLMs today work better than LLMs from six months ago, but only because there was an intelligent external entity, namely humans, who was providing the LLM with more layers, new training data, new training setups, new inference setups, etc.
…And if you’re now thinking “ohhh, OK, Steve is just talking about AI doing AI research, like recursive self-improvement, yeah duh, I already mentioned that in my first comment” … then you’re still misunderstanding me!
Again, think of the “genome = ML code” analogy (§3.1). In that analogy,
“AIs building better AIs by doing the exact same kinds of stuff that human researchers are doing today to build better AIs”
…would be analogous to…
“Early humans creating more intelligent descendants by doing biotech or selective breeding or experimentally-optimized child-rearing or whatever”.
But humans didn’t do that. We still have basically the same brains as our ancestors 100,000 years ago. And yet humans were still able to dramatically autonomously improve their capabilities, compared to 100,000 years ago. We were making stone tools back then, we’re making nuclear weapons now.
Thus, autonomous learning is a different axis of AI capabilities improvement. It’s unrelated to scaling, and it’s unrelated to “automated AI capabilities research” (as typically envisioned by people in the LLM-sphere). And “sharp left turn” is what I’m calling the transition from “no open-ended autonomous learning” (i.e., the status quo) to “yes open-ended autonomous learning” (i.e., sometime in the future). It’s a future transition, and it has profound implications, and it hasn’t even started (§1.5). It doesn’t have to happen overnight—see §3.7. See what I mean?
Thanks for your patience: I do think this message makes your point clearly. However, I’m sorry to say, I still don’t think I was missing the point. I reviewed §1.5, still believe I understand the open-ended autonomous learning distribution shift, and also find it scary. I also reviewed §3.7, and found it to basically match my model, especially this bit:
Overall, I don’t have the impression we disagree too much. My guess for what’s going on (and it’s my fault) is that my initial comment’s focus on scaling was not a reaction to anything you said in your post, in fact you didn’t say much about scaling at all. It was more a response to the scaling discussion I see elsewhere.
Interesting! I wonder what makes peopel feel like LLMs get them. I for sure don’t feel like Claude gets me. If anything, the opposite.
EDIT: Deepseek totally gets me tho
Not OP, but for me, it comes down to LLMs correctly interpreting the intent behind my questions/requests. In other words, I don’t need to be hyper specific in my prompts in order to get the results I want.
That makes sense and is well pretty obvious. Why isn’t claude getting me tho and he is getting other people? It’s hard for me to even explain claude what kind of code he should write. It is just a skill issue? Can someone teach me how to prompt claude?