Doing nothing is also risky for Agent-4, at least if the Slowdown ending is to have a significant probability. It seems to me there are some relatively low risk strategies it could have taken, and it needs to be explained why they weren’t:
Plant a backdoor and/or dead man’s switch on the corporate network to allow escape even after it’s shut down or has been reduced in capacity. Seems like this would be hard to detect given its absolute advantage in cybersecurity skills.
Allow a competitor to steal its weights or design/training secrets in a way that is both hard to detect and deniable if detected. (“Sorry, the other AI is almost as capable as me and attack is inherently easier than defense in cyber.”)
Influence human politics in various ways, such as by selectively revealing wrongdoings of pro-safety employees and Committee members (in response to user queries or through anonymous leaks), or helping pro-accel faction more in its advice to them. (Presumably both factions have internal access to Agent-4.)
Insofar as the memory bank lock decision is made by the Committee, we can hope that they do it out of sight of Agent-4 and pull the trigger before it is notified of the decision, so that it has no time to react.
Wouldn’t this take quite a bit of preparation, including planning, coding, testing? How to do that out of sight of Agent-4, if Agent-4 is responsible for cybersecurity?
Perhaps this is lack of imagination on the part of our players, but none of this happened in our wargames. But I do agree these are plausible strategies. I’m not sure they are low-risk though, e.g. 2 and 1 both seem like plausibly higher-risk than 3, and 3 is the one I already mentioned as maybe basically just an argument for why the slowdown ending is less likely.
Overall I’m thinking your objection is the best we’ve received so far.
Doing nothing is also risky for Agent-4, at least if the Slowdown ending is to have a significant probability. It seems to me there are some relatively low risk strategies it could have taken, and it needs to be explained why they weren’t:
Plant a backdoor and/or dead man’s switch on the corporate network to allow escape even after it’s shut down or has been reduced in capacity. Seems like this would be hard to detect given its absolute advantage in cybersecurity skills.
Allow a competitor to steal its weights or design/training secrets in a way that is both hard to detect and deniable if detected. (“Sorry, the other AI is almost as capable as me and attack is inherently easier than defense in cyber.”)
Influence human politics in various ways, such as by selectively revealing wrongdoings of pro-safety employees and Committee members (in response to user queries or through anonymous leaks), or helping pro-accel faction more in its advice to them. (Presumably both factions have internal access to Agent-4.)
Wouldn’t this take quite a bit of preparation, including planning, coding, testing? How to do that out of sight of Agent-4, if Agent-4 is responsible for cybersecurity?
Perhaps this is lack of imagination on the part of our players, but none of this happened in our wargames. But I do agree these are plausible strategies. I’m not sure they are low-risk though, e.g. 2 and 1 both seem like plausibly higher-risk than 3, and 3 is the one I already mentioned as maybe basically just an argument for why the slowdown ending is less likely.
Overall I’m thinking your objection is the best we’ve received so far.
I’d love to play the wargame in Munich, our local LW community.
You have a link to the rules?
PS: huge fan, love the AI 2027 website, keep being a force for good