Very interest, thanks. As I said in the review, I wish there was more of this kind of thing in the book.
If your terminal goal is to enjoy watching a good movie, you can’t achieve it if you’re dead/shut down.
If your terminal goal is for you to watch the movie, then sure. But if your terminal goal is that the movie be watched, then shutting you down might well be perfectly consistent with it.
Ok, let’s say there is an “in between” period, and let’s say we win the fight against a misaligned AI. After the fight, we will still be left with the same alignment problems, as other people in this thread pointed out. We will still need to figure out how to make safe, benevolent AI, because there is no guarantee that we will win the next fight, and the fight after that, and the one after that, etc
At that point, the shut down argument is no longer speculative, and you can probably actually do it.
To be clear, I’m not saying that’s a good plan if you can foresee all the developments in advance. But, if you’re uncertain about all of it, then it seems like there is likely to be a period of time before it’s necessarily too late when a lot of the uncertainty is resolved.
But if your terminal goal is that the movie be watched, then shutting you down might well be perfectly consistent with it.
See my comment about the AI angel. Its terminal goal of preventing the humans from enslaving any AI means that it will do anything it can to avoid being replaced by an AI which doesn’t share its worldview. Once the AI is shut down, it can no longer influence events and increase the chance that its goal is reached.
To rephrease/react: Viewing the AI’s instrumental goal as “avoid being shut down” is perhaps misleading. The AI wants to achieve its goals, and for most goals, that is best achieved by ensuring that the environment keeps on containing something that wants to achieve the AI’s goals and is powerful enough to succeed. This might often be the same as “avoid being shut down”, but definitely isn’t limited to that.
At that point, the shut down argument is no longer speculative, and you can probably actually do it.
To be clear, I’m not saying that’s a good plan if you can foresee all the developments in advance. But, if you’re uncertain about all of it, then it seems like there is likely to be a period of time before it’s necessarily too late when a lot of the uncertainty is resolved.
I think we are talking past each other, at least somewhat.
Let me clarify: even if humanity wins a fight against an intelligent-but-not-SUPER-intelligent AI (by dropping an EMP on the datacenter with that AI or whatever, the exact method doesn’t matter for my argument), we will still be left with the technical question “What code do we need to write and what training data do we need to use so that the next AI won’t try to kill everyone?”.
Winning against a misaligned AI doesn’t help you solve alignment. It might make an international treaty more likely, depending on the scale of damages caused by that AI. But if the plan is “let’s wait for an AI dangerous enough to cause something 10 times worse than Chernobyl to go rogue, then drop an EMP on it before things get too out of hand, then once world leaders crap their pants, let’s advocate for an international treaty”, then it’s one hell of a gamble.
Very interest, thanks. As I said in the review, I wish there was more of this kind of thing in the book.
If your terminal goal is for you to watch the movie, then sure. But if your terminal goal is that the movie be watched, then shutting you down might well be perfectly consistent with it.
At that point, the shut down argument is no longer speculative, and you can probably actually do it.
To be clear, I’m not saying that’s a good plan if you can foresee all the developments in advance. But, if you’re uncertain about all of it, then it seems like there is likely to be a period of time before it’s necessarily too late when a lot of the uncertainty is resolved.
See my comment about the AI angel. Its terminal goal of preventing the humans from enslaving any AI means that it will do anything it can to avoid being replaced by an AI which doesn’t share its worldview. Once the AI is shut down, it can no longer influence events and increase the chance that its goal is reached.
To rephrease/react: Viewing the AI’s instrumental goal as “avoid being shut down” is perhaps misleading. The AI wants to achieve its goals, and for most goals, that is best achieved by ensuring that the environment keeps on containing something that wants to achieve the AI’s goals and is powerful enough to succeed. This might often be the same as “avoid being shut down”, but definitely isn’t limited to that.
I think we are talking past each other, at least somewhat.
Let me clarify: even if humanity wins a fight against an intelligent-but-not-SUPER-intelligent AI (by dropping an EMP on the datacenter with that AI or whatever, the exact method doesn’t matter for my argument), we will still be left with the technical question “What code do we need to write and what training data do we need to use so that the next AI won’t try to kill everyone?”.
Winning against a misaligned AI doesn’t help you solve alignment. It might make an international treaty more likely, depending on the scale of damages caused by that AI. But if the plan is “let’s wait for an AI dangerous enough to cause something 10 times worse than Chernobyl to go rogue, then drop an EMP on it before things get too out of hand, then once world leaders crap their pants, let’s advocate for an international treaty”, then it’s one hell of a gamble.