three possibilities about AI alignment which are orthogonal to takeoff speed and timing
I think “AI Alignment difficulty is orthogonal to takeoff speed/timing” is quite conceptually tricky to think through, but still isn’t true. It’s conceptually tricky because the real truth about ‘alignment difficulty’ and takeoff speed, whatever it is, is probably logically or physically necessary: there aren’t really alternative outcomes there. But we have a lot of logical uncertainty and conceptual confusion, so it still looks like there are different possibilities. Still, I think they’re correlated.
First off, takeoff speed and timing are correlated: if you think HLMI is sooner, you must think progress towards HLMI will be faster, which implies takeoff will also be faster.
The faster we expect takeoff to go, the more likely it is that alignment is also difficult. There are two reasons for this. One is practical: the faster takeoff is, the less time you have to solve the problem before unaligned competitors become a problem. But the second is about the intrinsic difficulty of alignment (which I think is what you’re talking about here).
Much of the reason that alignment pessimists like Eliezer think that prosaic alignment can’t work, is that they expect that when we reach a capability discontinuity/find the core of general intelligence/enter the regime where AI capabilities start generalizing much further than they were before, whatever we were using to ensure corrigibility will suddenly break on us and probably trigger deceptive alignment immediately with no intermediate phase.
The more gradual and continuous you expect this scaling up to be, the more confident you should be in prosaic alignment, or alignment by default. There are other variables at play, the two aren’t in direct correlation, but they aren’t orthogonal.
(Also, the whole idea of getting assistance from AI tools on alignment research is in the mix here as well. If there’s a big capability discontinuity when we find the core of generality, that causes systems to generalize really far, and also breaks corrigibility, then plausibly but not necessarily, all the capabilities we need to do useful alignment research in time to avoid unaligned AI disasters are on the other side of that discontinuity, creating a chicken-and-egg problem.)
Another way of picking up on this fact is that many of the analogy arguments used for fast takeoff (for example, that human evolution gives us evidence for giant qualitative jumps in capability) also in very similar form are used to argue for difficult alignment (e.g. that when humans started ramping up in intelligence suddenly we also started ignoring the goals of our ‘outer optimiser’).
In the post, I wanted to distinguish between two things you’re now combining; how hard alignment is, and how long we have. And yes, combining these, we get the issue of how hard it will be to solve alignment in the time frame we have until we need to solve it. But they are conceptually distinct.
And neither of these directly relates to takeoff speed, which in the current framing is something like the time frame from when we have systems that are near-human until they hit a capability discontinuity. You said “First off, takeoff speed and timing are correlated: if you think HLMI is sooner, you must think progress towards HLMI will be faster, which implies takeoff will also be faster.” This last implication might be true, or might not. I agree that there are many worlds in which they are correlated, but there are plausible counter-examples. For instance, we may continue with fast progress and get to HLMI and a utopian freedom from almost all work, but then hit a brick wall on scaling deep learning, and have another AI winter until we figure out how to make actually AGI which can then scale to ASI—and that new approach could lead to either a slow or a fast takeoff. Or we may have progress slow to a crawl due to costs of scaling input and compute until we get to AGI, at which point self-improvement takeoff could be near-immediate, or could continue glacially.
And I agree with your claims about why Eliezer is pessimistic about prosaic alignment—but that’s not why he’s pessimistic about governance, which is a mostly unrelated pessimism.
Like I said in my first comment, the in practice difficulty of alignment is obviously connected to timeline and takeoff speed.
But you’re right that you’re talking about the intrinsic difficulty of alignment Vs takeoff speed in this post, not the in practice difficulty.
But those are also still correlated, for the reasons I gave—mainly that a discontinuity is an essential step in Eleizer style pessimism and fast takeoff views. I’m not sure how close this correlation is.
Do these views come apart in other possible worlds? I.e. could you believe in a discontinuity to a core of general intelligence but still think prosaic alignment can work?
I think that potentially you can—if you think that still enough capabilities in pre-HLMI AI (pre discontinuity) to help you do alignment research before dangerous HLMI shows up. But prosaic alignment seems to require more assumptions to be feasible assuming a discontinuity, like that the discontinuity doesn’t occur before all the important capabilities you need to do good alignment research.
I’m not sure I agree with the compatibility of discontinuity and prosaic alignment, though you make a reasonable case, but I do think there is compatibility between slower governance approaches and discontinuity, if it is far enough away.
I think “AI Alignment difficulty is orthogonal to takeoff speed/timing” is quite conceptually tricky to think through, but still isn’t true. It’s conceptually tricky because the real truth about ‘alignment difficulty’ and takeoff speed, whatever it is, is probably logically or physically necessary: there aren’t really alternative outcomes there. But we have a lot of logical uncertainty and conceptual confusion, so it still looks like there are different possibilities. Still, I think they’re correlated.
First off, takeoff speed and timing are correlated: if you think HLMI is sooner, you must think progress towards HLMI will be faster, which implies takeoff will also be faster.
The faster we expect takeoff to go, the more likely it is that alignment is also difficult. There are two reasons for this. One is practical: the faster takeoff is, the less time you have to solve the problem before unaligned competitors become a problem. But the second is about the intrinsic difficulty of alignment (which I think is what you’re talking about here).
Much of the reason that alignment pessimists like Eliezer think that prosaic alignment can’t work, is that they expect that when we reach a capability discontinuity/find the core of general intelligence/enter the regime where AI capabilities start generalizing much further than they were before, whatever we were using to ensure corrigibility will suddenly break on us and probably trigger deceptive alignment immediately with no intermediate phase.
The more gradual and continuous you expect this scaling up to be, the more confident you should be in prosaic alignment, or alignment by default. There are other variables at play, the two aren’t in direct correlation, but they aren’t orthogonal.
(Also, the whole idea of getting assistance from AI tools on alignment research is in the mix here as well. If there’s a big capability discontinuity when we find the core of generality, that causes systems to generalize really far, and also breaks corrigibility, then plausibly but not necessarily, all the capabilities we need to do useful alignment research in time to avoid unaligned AI disasters are on the other side of that discontinuity, creating a chicken-and-egg problem.)
Another way of picking up on this fact is that many of the analogy arguments used for fast takeoff (for example, that human evolution gives us evidence for giant qualitative jumps in capability) also in very similar form are used to argue for difficult alignment (e.g. that when humans started ramping up in intelligence suddenly we also started ignoring the goals of our ‘outer optimiser’).
In the post, I wanted to distinguish between two things you’re now combining; how hard alignment is, and how long we have. And yes, combining these, we get the issue of how hard it will be to solve alignment in the time frame we have until we need to solve it. But they are conceptually distinct.
And neither of these directly relates to takeoff speed, which in the current framing is something like the time frame from when we have systems that are near-human until they hit a capability discontinuity. You said “First off, takeoff speed and timing are correlated: if you think HLMI is sooner, you must think progress towards HLMI will be faster, which implies takeoff will also be faster.” This last implication might be true, or might not. I agree that there are many worlds in which they are correlated, but there are plausible counter-examples. For instance, we may continue with fast progress and get to HLMI and a utopian freedom from almost all work, but then hit a brick wall on scaling deep learning, and have another AI winter until we figure out how to make actually AGI which can then scale to ASI—and that new approach could lead to either a slow or a fast takeoff. Or we may have progress slow to a crawl due to costs of scaling input and compute until we get to AGI, at which point self-improvement takeoff could be near-immediate, or could continue glacially.
And I agree with your claims about why Eliezer is pessimistic about prosaic alignment—but that’s not why he’s pessimistic about governance, which is a mostly unrelated pessimism.
Like I said in my first comment, the in practice difficulty of alignment is obviously connected to timeline and takeoff speed.
But you’re right that you’re talking about the intrinsic difficulty of alignment Vs takeoff speed in this post, not the in practice difficulty.
But those are also still correlated, for the reasons I gave—mainly that a discontinuity is an essential step in Eleizer style pessimism and fast takeoff views. I’m not sure how close this correlation is.
Do these views come apart in other possible worlds? I.e. could you believe in a discontinuity to a core of general intelligence but still think prosaic alignment can work?
I think that potentially you can—if you think that still enough capabilities in pre-HLMI AI (pre discontinuity) to help you do alignment research before dangerous HLMI shows up. But prosaic alignment seems to require more assumptions to be feasible assuming a discontinuity, like that the discontinuity doesn’t occur before all the important capabilities you need to do good alignment research.
I’m not sure I agree with the compatibility of discontinuity and prosaic alignment, though you make a reasonable case, but I do think there is compatibility between slower governance approaches and discontinuity, if it is far enough away.