The biggest disagreement between me and more pessimistic researchers is that I think gradual takeoff is much more likely than discontinuous takeoff (and in fact, the first, third and fourth paragraphs above are quite weak if there’s a discontinuous takeoff).
It’s been argued before that Continuous is not the same as Slow by any normal standard, so the strategy of ‘dealing with things as they come up’, while more viable under a continuous scenario, will probably not be sufficient.
It seems to me like you’re assuming longtermists are very likely not required at all in a case where progress is continuous. I take continuous to just mean that we’re in a world where there won’t be sudden jumps in capability, or apparently useless systems suddenly crossing some threshold and becoming superintelligent, not where progress is slow or easy to reverse. We could still pick a completely wrong approach that makes alignment much more difficult and set ourselves on a likely path towards disaster, even if the following is true:
So far as I can tell, the best one-line summary for why we should expect a continuous and not a fast takeoff comes from the interview Paul Christiano gave on the 80k podcast: ‘I think if you optimize AI systems for reasoning, it appears much, much earlier.’
So far as I can tell, Paul’s point is that absent specific reasons to think otherwise, the prima facie case that any time we are trying hard to optimize for some criteria, we should expect the ‘many small changes that add up to one big effect’ situation.
Then he goes on to argue that the specific arguments that AGI is a rare case where this isn’t true (like nuclear weapons) are either wrong or aren’t strong enough to make discontinuous progress plausible.
In a world where continuous but moderately fast takeoff is likely, I can easily imagine doom scenarios that would require long term strategy or conceptual research early on to avoid, even if none of them involve FOOM. Imagine that the accepted standard for aligned AI is follows some particular research agenda, like Cooperative Inverse Reinforcement Learning, but it turns out that CIRL starts to behave pathologically and tries to wirehead itself as it gets more and more capable, and that its a fairly deep flaw that we can only patch and not avoid.
Let’s say that over the course of a couple of years failures of CIRL systems start to appear and compound very rapidly until they constitute an Existential disaster. Maybe people realize what’s going on, but by then it would be too late, because the right approach would have been to try some other approach to AI alignment but the research to do that doesn’t exist and can’t be done anywhere near fast enough. Like Paul Christiano’s what failure looks like
In the situations you describe, I would still be somewhat optimistic about coordination. But yeah, such situations leading to doom seem plausible, and this is why the estimate is 90% instead of 95% or 99%. (Though note that the numbers are very rough.)
It’s been argued before that Continuous is not the same as Slow by any normal standard, so the strategy of ‘dealing with things as they come up’, while more viable under a continuous scenario, will probably not be sufficient.
It seems to me like you’re assuming longtermists are very likely not required at all in a case where progress is continuous. I take continuous to just mean that we’re in a world where there won’t be sudden jumps in capability, or apparently useless systems suddenly crossing some threshold and becoming superintelligent, not where progress is slow or easy to reverse. We could still pick a completely wrong approach that makes alignment much more difficult and set ourselves on a likely path towards disaster, even if the following is true:
In a world where continuous but moderately fast takeoff is likely, I can easily imagine doom scenarios that would require long term strategy or conceptual research early on to avoid, even if none of them involve FOOM. Imagine that the accepted standard for aligned AI is follows some particular research agenda, like Cooperative Inverse Reinforcement Learning, but it turns out that CIRL starts to behave pathologically and tries to wirehead itself as it gets more and more capable, and that its a fairly deep flaw that we can only patch and not avoid.
Let’s say that over the course of a couple of years failures of CIRL systems start to appear and compound very rapidly until they constitute an Existential disaster. Maybe people realize what’s going on, but by then it would be too late, because the right approach would have been to try some other approach to AI alignment but the research to do that doesn’t exist and can’t be done anywhere near fast enough. Like Paul Christiano’s what failure looks like
In the situations you describe, I would still be somewhat optimistic about coordination. But yeah, such situations leading to doom seem plausible, and this is why the estimate is 90% instead of 95% or 99%. (Though note that the numbers are very rough.)