This may seem plausible because new evidence about the technical difficulty of alignment was the main reason MIRI pivoted away from their plan, but I want to argue that actually even without this information, there were good enough arguments back then to conclude that the plan was bad
I think it’s not totally implausible that one could have called correctly well in advance that the problem itself would be too hard, without actually seeing much evidence that results from MIRI’s and other attempts. I think one could consider things like “the mind is very complex and illegible” and “you have to have a good grasp on the sources of capabilities because of reflective self-modification” and “we have no idea what values are or how intelligence really works”, and maybe get justified confidence, I’m not sure.
But it seems like you’re not arguing that in this post, but instead saying that it was a bad plan even if alignment was easy? I don’t think that’s right, given the stakes and given the difficulty of all plausible plans. I think you can do the thing of trying to solve it, and not be overconfident, and if you do solve it then you use it to end acute risk, but if you don’t solve it you don’t build it. (And indeed IIUC much of the MIRI researcher blob pivoted to other things due to alignment difficulty.) If hypothetically you had a real solution, and triple quadruple checked everything, and did a sane and moral process to work out governance, then I think I’d want the plan to be executed, including “burn all the GPUs” or similar.
If hypothetically you had a real solution, and triple quadruple checked everything, and did a sane and moral process to work out governance, then I think I’d want the plan to be executed, including “burn all the GPUs” or similar.
First note that the context of my old debate was MIRI’s plan to build a Friendly (sovereign) AI, not the later “burn all the GPUs” Task AI plan. If I was debating the Task AI plan, I’d probably emphasize the “roll your own metaethics” aspect a bit less (although even the Task AI would still have philosophical dependencies like decision theory), and emphasize more that there aren’t good candidate tasks to for the AI to do. E.g. “burn all the GPUs” wouldn’t work because the AI race would just restart the day after with everyone building new GPUs. (This is not Eliezer’s actual task for the Task AI, but I don’t remember his rationale for keeping the actual task secret so I don’t know if I can talk about it here. I think the actual task has similar problems though.)
My other counterarguments all apply as written, so I’m confused that you seem to have entirely ignored them. I guess I’ll reiterate some of them here:
What’s a sane and moral process to work out governance? Did anyone write something down? It seems implausible to me, given other aspects of the plan (i.e., speed and secrecy). If one’s standard for “sane and moral” is something like the current Statement on Superintelligence, then it just seems impossible.
“Triple quadruple checked everything” can’t be trusted when you’re a small team aiming for speed and secrecy. There are instances where widely deployed supposedly “provably secure” cryptographic algorithms and protocols (with proofs published and reviewable by the entire research community, who have clear incentives to find and publish any flaws) years later turned out to be actually insecure because some implicit or explicit assumption used by the proof (e.g., about what the attacker is allowed to do) turned out to be wrong. And that’s a much better understood, inherently simpler problem that has been studied for decades, with public adversarial review processes that much better mitigate human biases compared to a closed small team.
See also items 2 and 5 in my OP.
and not be overconfident
I didn’t talk about this in the OP (due to potentially distracting from other more important points) but I think Eliezer at least was/is clearly overconfident, judging from a number of observations including his confidence in his philosophical positions. (And overconfidence is just quite hard to avoid in general.) We’re lucky in a way that his ideas for building FAI or a safe Task AI didn’t almost work out, but instead fell wide of the mark, otherwise I think MIRI itself had a high chance of destroying the world.
Well, I meant to address them in a sweeping / not very detailed way. Basically I’m saying that they don’t seem like the sort of thing that should necessarily in real life prevent one from doing a Task-ish pivotal act. In other words, yes, {governance, the world not trusting MIRI, extreme power concentration} are very serious concerns, but in real life I would pretty plausibly—depending on the specific situation—say “yeah ok you should go ahead anyway”. I take your point about takeover-FAI; FWIW I had the impression that takeover-FAI was more like a hypothetical for purposes of design-thinking, like “please notice that your design would be really bad if it were doing a takeover; therefore it’s also bad for pivotal-task, because pivotal-task is quite difficult and relies on many of the same things as a hypothetical safe-takeover-FAI”.
I think it’s not totally implausible that one could have called correctly well in advance that the problem itself would be too hard, without actually seeing much evidence that results from MIRI’s and other attempts. I think one could consider things like “the mind is very complex and illegible” and “you have to have a good grasp on the sources of capabilities because of reflective self-modification” and “we have no idea what values are or how intelligence really works”, and maybe get justified confidence, I’m not sure.
But it seems like you’re not arguing that in this post, but instead saying that it was a bad plan even if alignment was easy? I don’t think that’s right, given the stakes and given the difficulty of all plausible plans. I think you can do the thing of trying to solve it, and not be overconfident, and if you do solve it then you use it to end acute risk, but if you don’t solve it you don’t build it. (And indeed IIUC much of the MIRI researcher blob pivoted to other things due to alignment difficulty.) If hypothetically you had a real solution, and triple quadruple checked everything, and did a sane and moral process to work out governance, then I think I’d want the plan to be executed, including “burn all the GPUs” or similar.
First note that the context of my old debate was MIRI’s plan to build a Friendly (sovereign) AI, not the later “burn all the GPUs” Task AI plan. If I was debating the Task AI plan, I’d probably emphasize the “roll your own metaethics” aspect a bit less (although even the Task AI would still have philosophical dependencies like decision theory), and emphasize more that there aren’t good candidate tasks to for the AI to do. E.g. “burn all the GPUs” wouldn’t work because the AI race would just restart the day after with everyone building new GPUs. (This is not Eliezer’s actual task for the Task AI, but I don’t remember his rationale for keeping the actual task secret so I don’t know if I can talk about it here. I think the actual task has similar problems though.)
My other counterarguments all apply as written, so I’m confused that you seem to have entirely ignored them. I guess I’ll reiterate some of them here:
What’s a sane and moral process to work out governance? Did anyone write something down? It seems implausible to me, given other aspects of the plan (i.e., speed and secrecy). If one’s standard for “sane and moral” is something like the current Statement on Superintelligence, then it just seems impossible.
“Triple quadruple checked everything” can’t be trusted when you’re a small team aiming for speed and secrecy. There are instances where widely deployed supposedly “provably secure” cryptographic algorithms and protocols (with proofs published and reviewable by the entire research community, who have clear incentives to find and publish any flaws) years later turned out to be actually insecure because some implicit or explicit assumption used by the proof (e.g., about what the attacker is allowed to do) turned out to be wrong. And that’s a much better understood, inherently simpler problem that has been studied for decades, with public adversarial review processes that much better mitigate human biases compared to a closed small team.
See also items 2 and 5 in my OP.
I didn’t talk about this in the OP (due to potentially distracting from other more important points) but I think Eliezer at least was/is clearly overconfident, judging from a number of observations including his confidence in his philosophical positions. (And overconfidence is just quite hard to avoid in general.) We’re lucky in a way that his ideas for building FAI or a safe Task AI didn’t almost work out, but instead fell wide of the mark, otherwise I think MIRI itself had a high chance of destroying the world.
Well, I meant to address them in a sweeping / not very detailed way. Basically I’m saying that they don’t seem like the sort of thing that should necessarily in real life prevent one from doing a Task-ish pivotal act. In other words, yes, {governance, the world not trusting MIRI, extreme power concentration} are very serious concerns, but in real life I would pretty plausibly—depending on the specific situation—say “yeah ok you should go ahead anyway”. I take your point about takeover-FAI; FWIW I had the impression that takeover-FAI was more like a hypothetical for purposes of design-thinking, like “please notice that your design would be really bad if it were doing a takeover; therefore it’s also bad for pivotal-task, because pivotal-task is quite difficult and relies on many of the same things as a hypothetical safe-takeover-FAI”.