This is still a quite good post on how to think about AI in the near-term, and the lessons generalize broadly beyond even the specific examples.
The main lessons I take away from this post are these:
Take AI jaggedness seriously, and don’t expect AI complete problems (yes, continual/online learning will reduce the jaggedness, but I do not expect AI to eliminate jagged capabilities).
Summed up as “often the first system to do X will not be the first systems to do Y”.
Be careful of relying too much on abstraction, and remember that reality is surprisingly detailed (which explains why AI capabilities are in part as jagged as they are).
Summed up by this quote:
But knowing only that superintelligences are really intelligent doesn’t help with designing the scheming-focused capability evaluations we should do on GPT-5, and abstracting over the specific prerequisite skills makes it harder to track when we should expect scheming to be a problem (relative to other capabilities of models).[1] And this is the viewpoint I was previously missing.
Being better than a human at something, or more generally being better than someone at something is way more mundane than fictional portrayals of superheroes/AIs are (this is usually a consequence of the human tendency to assume discontinuities everywhere combined with rules in fiction often being way more ad-hoc than in our reality.)
And this matters because this should update us towards believing that we can delegate/automate quite a lot of alignment research in the critical period, meaning that our chance at surviving AI is higher than often assumed.
I will say that there is a caveat in that I now believe that persuasion/epistemics is one of the few areas where I expect much more discontinuous changes, but that’s going to be discussed below when I discuss my use of reacts on the post.
We should take AI safety automation more seriously. This is admittedly a thing LWers have gotten a lot better on, but it was somewhat needed at the time.
I do admit that this is quite continuous with the 3rd lesson, so I’m not going to dwell on details here.
Just know that AI automation of AI safety shouldn’t be dismissed casually.
Now that I’m done listing off lessons, I want to talk about my use of reacts.
I used 5 reacts on the post, and I missed the mark on 2 reacts, that being the hit the mark react on persuasion and using the important react on a study.
The main reason I missed the mark here is that I now believe that the study only shows that it’s easy for AIs to persuade people for a short time when the topic isn’t salient, and unfortunately most of the high-value epistemic applications will make topics salient, meaning that persuasion is way harder than people believe it is.
Persuasion/Epistemics is one of the few domains where I expect strongly discontinuous progress, but due to the hardness of persuasion I now think that AI persuasion is much less of a threat than I used to think (in the regime where our choices actually matter for AI risk), and this makes me more optimistic than I used to be around trusting AI outputs even in domains where tasks are hard to verify, as humans are very, very hard to persuade.
(A good book on this is Not Born Yesterday: The Science of Who We Trust and What We Believe by Hugo Mercier.)
I’d give this a +4 vote. It’s good and important for the near-term, and while not the most important for AI, it’s still a pretty good collection of ideas (though persuasion capability will increase far more discontinuously than the author claims).
This is still a quite good post on how to think about AI in the near-term, and the lessons generalize broadly beyond even the specific examples.
The main lessons I take away from this post are these:
Take AI jaggedness seriously, and don’t expect AI complete problems (yes, continual/online learning will reduce the jaggedness, but I do not expect AI to eliminate jagged capabilities).
Summed up as “often the first system to do X will not be the first systems to do Y”.
Be careful of relying too much on abstraction, and remember that reality is surprisingly detailed (which explains why AI capabilities are in part as jagged as they are).
Summed up by this quote:
Being better than a human at something, or more generally being better than someone at something is way more mundane than fictional portrayals of superheroes/AIs are (this is usually a consequence of the human tendency to assume discontinuities everywhere combined with rules in fiction often being way more ad-hoc than in our reality.)
And this matters because this should update us towards believing that we can delegate/automate quite a lot of alignment research in the critical period, meaning that our chance at surviving AI is higher than often assumed.
I will say that there is a caveat in that I now believe that persuasion/epistemics is one of the few areas where I expect much more discontinuous changes, but that’s going to be discussed below when I discuss my use of reacts on the post.
We should take AI safety automation more seriously. This is admittedly a thing LWers have gotten a lot better on, but it was somewhat needed at the time.
I do admit that this is quite continuous with the 3rd lesson, so I’m not going to dwell on details here.
Just know that AI automation of AI safety shouldn’t be dismissed casually.
Now that I’m done listing off lessons, I want to talk about my use of reacts.
I used 5 reacts on the post, and I missed the mark on 2 reacts, that being the hit the mark react on persuasion and using the important react on a study.
The main reason I missed the mark here is that I now believe that the study only shows that it’s easy for AIs to persuade people for a short time when the topic isn’t salient, and unfortunately most of the high-value epistemic applications will make topics salient, meaning that persuasion is way harder than people believe it is.
Persuasion/Epistemics is one of the few domains where I expect strongly discontinuous progress, but due to the hardness of persuasion I now think that AI persuasion is much less of a threat than I used to think (in the regime where our choices actually matter for AI risk), and this makes me more optimistic than I used to be around trusting AI outputs even in domains where tasks are hard to verify, as humans are very, very hard to persuade.
(A good book on this is Not Born Yesterday: The Science of Who We Trust and What We Believe by Hugo Mercier.)
I’d give this a +4 vote. It’s good and important for the near-term, and while not the most important for AI, it’s still a pretty good collection of ideas (though persuasion capability will increase far more discontinuously than the author claims).