Mateusz Bagiński comments on Mateusz Bagiński’s Shortform

Mateusz Bagiński 11 May 2025 5:54 UTC
5 points
0
Recently, I watched Out of This Box. In the musical, they test their nascent AGI on the Christiano-Sisskind test, a successor to the Turing test. What the test involves exactly remains unexplained. Here are my hypotheses.^[1]
Sisskind certainly refers to Scott Alexander, and one thing that Scott Alexander posted about something in the vicinity of the Turing test was this post (italics added):
The year is 2028, and this is Turing Test!, the game show that separates man from machine! Our star tonight is Dr. Andrea Mann, a generative linguist at University of California, Berkeley. She’ll face five hidden contestants, code-named Earth, Water, Air, Fire, and Spirit. One will be a human telling the truth about their humanity. One will be a human pretending to be an AI. One will be an AI telling the truth about their artificiality. One will be an AI pretending to be human. And one will be a total wild card. Dr. Mann, you have one hour, starting now.
Notably, the last line in the post is:
MANN: You said a bad word! You’re a human pretending to be an AI pretending to be a human! I knew it!
Christiano is, of course, Paul Christiano. One of the many things that Paul Christiano came up with is HCH:
Consider a human who has access to a question-answering machine. Suppose the machine answers questions by perfectly imitating what the human would do if asked that question.

To make things twice as tricky, suppose the human-to-be-imitated is herself able to consult a question-answering machine, which answers questions by perfectly imitating what the human would do if asked that question…

Let’s call this process HCH, for “Humans Consulting HCH.”
The limit of HCH is an infinite HCH tree where each node is able to consult a subtree rooted at itself, to answer the question coming from its parent node.
My first hypothesis is that the Christiano-Sisskind test is some recursive shenanigan like the following:
1. The AI is given a task to imitate several different humans, and its performance is measured, e.g., by testing how well/badly other AIs do at recognizing whether it is an AI or the human it’s trying to simulate. So basically a standard Turing test.
2. Each of those simulated humans now has to imitate several different AIs. Performance is assessed again.
3. Rinse and repeat, until some maximum tree depth.
The AI’s overall performance is somehow aggregated into the score of the test.
An alternative possibility is that the test involves something closer to Debate, but I’m much more unsure what that might look like. Maybe something like:
- We are testing our AI against some other AI (think pairwise model comparison).
- Each AI is given some finite chain to simulate: The AI is simulating human A simulating AI X simulating human B simulating AI Y … simulating human-or-AI Z.
- The two AIs, called Alice and Bob, are debating on two separate argument trees regarding the identities of Alice and Bob. Alice wants to convince Bob that she is Alice-Z and not something else. Bob wants to convince Alice that he is Bob-Z and not something else.
- The winner is the one who argues more persuasively (measured by the final judgment of a judge who is either a human or some third AI).
1. ^
  If any of the musical creators are watching this, I’m curious how close this is to what you had in mind (if you had anything specific in mind (but it’s, of course, totally valid not to have anything specific in mind and just nerd-name-drop Christiano and Scott)).