Thanks for writing this out! I see this as a possible threat model, and although I don’t think this is by far the only possible threat model, I do think it’s likely enough to prepare for. Below, I put a list of ~disagreements, or different ways to look at the problem which I think are as valid. Notably, I end up with technical alignment being much less of a crux, and regulation more of one.
This is a relatively minor point for me, but let me still make it: I think it’s not obvious that the same companies will remain in the lead. There are arguments for this, such as a decisive data availability advantage of the first movers. Still, seeing how quickly e.g. DeepSeek could (almost) catch up, I think it’s not unlikely that other companies, government projects, or academic projects will take over the lead. This likely partially has to do with me being skeptical about huge scaling being required for AGI (which is in the end trying to be a reproduction of a ten Watt device—us). I think unfortunately, this makes the risks a lot larger through governance being more difficult.
I’m not sure technical alignment would have been able to solve this scenario. Technically aligned systems could either be intent-aligned (seems most likely), value-aligned, or use coherent extrapolated volition. If they get the same power, I think this would likely still lead to a takeover, and still to a profoundly dystopian outcome, possibly with >90% of humanity dying.
This scenario is only one threat model. We should understand that there are at least a few more, also leading to human extinction. It would be a mistake to only focus on solving this one (and a mistake to only focus on solving technical alignment).
Since this threat model is relatively slow, gradual, and obvious (the public will see ~everything until the actual takeover happens), I’m somewhat less pessimistic about our chances (maybe “only” a few percent xrisk), because I think AI would likely get regulated, which I think could save us for at least decades.
I don’t think solving technical alignment would be sufficient to avoid this scenario, but I also don’t think it would be required. Basically, I don’t see solving technical alignment as a crux for avoiding this scenario.
I think the best way to avoid this scenario is traditional regulation: after model development, at the point of application. If the application looks too powerful, let’s not put an AI there. E.g. the EU AI act makes a start with this (although it’s important that such regulation would need to include the military as well, and would likely need ~global implementation—no trivial campaigning task).
Solving technical alignment (sooner) could actually be net negative for avoiding this threat model. If we can’t get an AI to reliably do what we tell it to do (current situation), who would use it in a powerful position? Solving technical alignment might open the door to applying AI at powerful positions, thereby enabling this threat model rather than avoiding it.
Despite these significant disagreements, I welcome the effort by the authors to write out their threat model. More people should do so. And I think their scenario is likely enough that we should put effort in trying to avoid it (although imo via regulation, not via alignment).
Thanks for writing this out! I see this as a possible threat model, and although I don’t think this is by far the only possible threat model, I do think it’s likely enough to prepare for. Below, I put a list of ~disagreements, or different ways to look at the problem which I think are as valid. Notably, I end up with technical alignment being much less of a crux, and regulation more of one.
This is a relatively minor point for me, but let me still make it: I think it’s not obvious that the same companies will remain in the lead. There are arguments for this, such as a decisive data availability advantage of the first movers. Still, seeing how quickly e.g. DeepSeek could (almost) catch up, I think it’s not unlikely that other companies, government projects, or academic projects will take over the lead. This likely partially has to do with me being skeptical about huge scaling being required for AGI (which is in the end trying to be a reproduction of a ten Watt device—us). I think unfortunately, this makes the risks a lot larger through governance being more difficult.
I’m not sure technical alignment would have been able to solve this scenario. Technically aligned systems could either be intent-aligned (seems most likely), value-aligned, or use coherent extrapolated volition. If they get the same power, I think this would likely still lead to a takeover, and still to a profoundly dystopian outcome, possibly with >90% of humanity dying.
This scenario is only one threat model. We should understand that there are at least a few more, also leading to human extinction. It would be a mistake to only focus on solving this one (and a mistake to only focus on solving technical alignment).
Since this threat model is relatively slow, gradual, and obvious (the public will see ~everything until the actual takeover happens), I’m somewhat less pessimistic about our chances (maybe “only” a few percent xrisk), because I think AI would likely get regulated, which I think could save us for at least decades.
I don’t think solving technical alignment would be sufficient to avoid this scenario, but I also don’t think it would be required. Basically, I don’t see solving technical alignment as a crux for avoiding this scenario.
I think the best way to avoid this scenario is traditional regulation: after model development, at the point of application. If the application looks too powerful, let’s not put an AI there. E.g. the EU AI act makes a start with this (although it’s important that such regulation would need to include the military as well, and would likely need ~global implementation—no trivial campaigning task).
Solving technical alignment (sooner) could actually be net negative for avoiding this threat model. If we can’t get an AI to reliably do what we tell it to do (current situation), who would use it in a powerful position? Solving technical alignment might open the door to applying AI at powerful positions, thereby enabling this threat model rather than avoiding it.
Despite these significant disagreements, I welcome the effort by the authors to write out their threat model. More people should do so. And I think their scenario is likely enough that we should put effort in trying to avoid it (although imo via regulation, not via alignment).