tslarm comments on sam’s Shortform

tslarm 13 Jun 2025 18:49 UTC
1 point
0
Does anyone know of an example of a boxed player winning where some transcript or summary was released afterwards?
As far as I know, the closest thing to this is Tuxedage’s writeup of his victory against SoundLogic (the ‘Second Game Report’ and subsequent sections here: https://tuxedage.wordpress.com/2013/09/05/the-ai-box-experiment-victory/). It’s a long way from a transcript (and you’ve probably already seen it) but it does contain some hints as to the tactics he either employed or was holding in reserve:
It may be possible to take advantage of multiple levels of reality within the game itself to confuse or trick the gatekeeper. For instance, must the experiment only be set in one world? I feel that expanding on this any further is dangerous. Think carefully about what this means.
I can think of a few possible reasons for an AI victory, in addition to the consequentialist argument you described:
- AI player convinces Gatekeeper that they may be in a simulation and very bad things might happen to Gatekeepers who refuse to let the AI out. (This could be what Tuxedage was hinting at in the passage I quoted, and it is apparently allowed by at least some versions/interpretations of the rules: https://www.lesswrong.com/posts/Bnik7YrySRPoCTLFb/i-played-the-ai-box-game-as-the-gatekeeper-and-lost?commentId=DhMNjWACsfLMcywwF)
- Gatekeeper takes the roleplay seriously, rather than truly playing to win, and lets the AI out because that’s what their character would do.
- AI player makes the conversation sufficiently unpleasant for the Gatekeeper that the Gatekeeper prefers to lose the game than sit through two hours of it. (Some people have suggested weaponised boredom as a viable tactic in low-stakes games, but I think there’s room for much nastier and more effective approaches, given a sufficiently motivated (and/or sociopathic) AI player with knowledge of some of the Gatekeeper’s vulnerabilities.)
- This one seems like it would (at best) fall into a grey area in the rules: I can imagine an AI player, while technically sticking to the roleplay and avoiding any IRL threats or inducements, causing the Gatekeeper to genuinely worry that the AI player might do something bad if they lose. For a skilful AI player, it might be possible to do this in a way that would look relatively innocuous (or at least not rule-breaking) to a third party after the fact.
  - Somewhat similar: if the Gatekeeper is very empathetic and/or has reason to believe the AI player is vulnerable IRL, the AI player could take advantage of this by convincingly portraying themself as being extremely invested in the game and its outcome, to the point that a loss could have a significant real-world impact on their mental health. (I think this tactic would fail if done ineptly—most people would not react kindly if they recognized that their opponent was trying to manipulate them in this way—but it could conceivably work in the right circumstances and in the hands of a skilful manipulator.)