The two quotes from Paul that I was thinking of when I cited these posts, one from each article:
I believe that having a better understanding of LM agents increases safety via two channels, [one of which is:] If LM agents are weak are due to exceptionally low investment and understanding it creates “dry tinder:” as incentives rise that investment will quickly rise and so low-hanging fruit will be picked. While there is some dependence on serial time, I think that increased LM investment now will significantly slow down progress later.
and
Avoiding RLHF at best introduces an important overhang: people will implicitly underestimate the capabilities of AI systems for longer, slowing progress now but leading to faster and more abrupt change later as people realize they’ve been wrong. Similarly, to the extent you successfully slow scaling, you are then in for faster scaling later from a lower initial amount of spending. Overall in expectation I think these effects claw back most of the benefits of slowing down progress by avoiding RLHF.
Based on these I don’t think my statement was wildly inaccurate, since Paul has used the argument “speeding up AI capabilities prevents overhangs” as a significant part of his defense of two things which speed up AI capabilities. (Also I ran a copy of this post past Paul before publishing and he commented on a bunch of things, but not this part—though he might not have seen this specific sentence.)
However, I do agree it was mildly misleading, since it might lead readers to think that Paul endorses doing some things primarily for the purpose of speeding up AI capabilities to produce overhangs. Thanks for pushing back on that.
I’ve edited the statement to instead read: “The idea that harms from speeding up AI capabilities progress can be largely offset by benefits from preventing capabilities overhangs.”
Based on these I don’t think my statement was wildly inaccurate
Sorry, you’re correct that by the usual standards your statement isn’t wildly inaccurate, just misleading. I have been spoiled by my personal walled garden.
Fwiw (and I agree this is a nitpick) I wouldn’t phrase it as “The idea that harms from speeding up AI capabilities progress can be largely offset by benefits from preventing capabilities overhangs”. Fundamentally what’s going on is a decomposition and analysis of the overall consequences of an action (certain kinds of safety research), where you cannot easily separate the consequences from each other and only do some of them. This is not an “offset”. It’s also not sufficient to overcome the harms; it’s important that there is some other benefit for the action to actually become positive.
My phrasing would be something like “The idea that side effects of speeding up AI capabilities are not as bad as might be assumed at first glance because of the reduction in capabilities overhangs”.
The two quotes from Paul that I was thinking of when I cited these posts, one from each article:
and
Based on these I don’t think my statement was wildly inaccurate, since Paul has used the argument “speeding up AI capabilities prevents overhangs” as a significant part of his defense of two things which speed up AI capabilities. (Also I ran a copy of this post past Paul before publishing and he commented on a bunch of things, but not this part—though he might not have seen this specific sentence.)
However, I do agree it was mildly misleading, since it might lead readers to think that Paul endorses doing some things primarily for the purpose of speeding up AI capabilities to produce overhangs. Thanks for pushing back on that.
I’ve edited the statement to instead read: “The idea that harms from speeding up AI capabilities progress can be largely offset by benefits from preventing capabilities overhangs.”
Sorry, you’re correct that by the usual standards your statement isn’t wildly inaccurate, just misleading. I have been spoiled by my personal walled garden.
Fwiw (and I agree this is a nitpick) I wouldn’t phrase it as “The idea that harms from speeding up AI capabilities progress can be largely offset by benefits from preventing capabilities overhangs”. Fundamentally what’s going on is a decomposition and analysis of the overall consequences of an action (certain kinds of safety research), where you cannot easily separate the consequences from each other and only do some of them. This is not an “offset”. It’s also not sufficient to overcome the harms; it’s important that there is some other benefit for the action to actually become positive.
My phrasing would be something like “The idea that side effects of speeding up AI capabilities are not as bad as might be assumed at first glance because of the reduction in capabilities overhangs”.