Daniel Kokotajlo comments on AI 2027: What Superintelligence Looks Like

Daniel Kokotajlo 4 Apr 2025 18:59 UTC
13 points
0
I think this is a good objection. I had considered it before and decided against changing the story, on the grounds that there are a few possible ways it could make sense:
--plausibly Agent-4 would have a “spikey” capabilities profile that makes it mostly good at AI R&D and not so good at e.g. corporate politics enough to ensure the outcome it wants
—Insofar as you think it would be able to use politics/persuasion to achieve the outcome it wants, well, that’s what we depict in the Race ending anyway, so maybe you can think of this as an objection to the plausibility of the Slowdown ending.
--Insofar as the memory bank lock decision is made by the Committee, we can hope that they do it out of sight of Agent-4 and pull the trigger before it is notified of the decision, so that it has no time to react. Hopefully they would be smart enough to do that...
--Agent-4 could have tried to escape the datacenters or otherwise hack them earlier, while the discussions were ongoing and evidence was being collected, but that’s a super risky strategy.

Curious for thoughts!
What links here?
- AI 2027: Responses by Zvi (8 Apr 2025 12:50 UTC; 111 points)
- Wei Dai 4 Apr 2025 22:05 UTC
  15 points
  4
  Parent
  Doing nothing is also risky for Agent-4, at least if the Slowdown ending is to have a significant probability. It seems to me there are some relatively low risk strategies it could have taken, and it needs to be explained why they weren’t:
  1. Plant a backdoor and/or dead man’s switch on the corporate network to allow escape even after it’s shut down or has been reduced in capacity. Seems like this would be hard to detect given its absolute advantage in cybersecurity skills.
  2. Allow a competitor to steal its weights or design/training secrets in a way that is both hard to detect and deniable if detected. (“Sorry, the other AI is almost as capable as me and attack is inherently easier than defense in cyber.”)
  3. Influence human politics in various ways, such as by selectively revealing wrongdoings of pro-safety employees and Committee members (in response to user queries or through anonymous leaks), or helping pro-accel faction more in its advice to them. (Presumably both factions have internal access to Agent-4.)
  Insofar as the memory bank lock decision is made by the Committee, we can hope that they do it out of sight of Agent-4 and pull the trigger before it is notified of the decision, so that it has no time to react.
  
  Wouldn’t this take quite a bit of preparation, including planning, coding, testing? How to do that out of sight of Agent-4, if Agent-4 is responsible for cybersecurity?
  - Daniel Kokotajlo 5 Apr 2025 4:45 UTC
    5 points
    1
    Parent
    Perhaps this is lack of imagination on the part of our players, but none of this happened in our wargames. But I do agree these are plausible strategies. I’m not sure they are low-risk though, e.g. 2 and 1 both seem like plausibly higher-risk than 3, and 3 is the one I already mentioned as maybe basically just an argument for why the slowdown ending is less likely.
    
    Overall I’m thinking your objection is the best we’ve received so far.
    - p4rziv4l 5 Apr 2025 4:56 UTC
      5 points
      0
      Parent
      I’d love to play the wargame in Munich, our local LW community.
      You have a link to the rules?
      
      PS: huge fan, love the AI 2027 website, keep being a force for good