I’m not sure this changes the underlying disagreement. “desire to be predicted to one-box” is universal. Two-boxers deny the causal link from the prediction to the action. One boxers deny their own free will, and accept that the prediction must match the behavior.
Talk about “desire” or “intent” or “disposition” is only an attempt to elucidate that the full question is whether the prediction can actually be correct.
In other words, the results table below, the whole debate is whether X is reachable in the in-game universe (which may or may not map to our universe in terms of causality)
Your Action
Omega predict One-box Two-box
One-box $1M X
Two-box X $1000
Here’s Adam. Adam is an agent who believes in one-boxing. He knows arguments for one-boxing, and says them often. Unfortunately, Adam had a stroke last year, and has spatial neglect. His conscious reason does not correctly perceive the controls on the predictor’s machine. When he consciously intends to press the “one-box” button, he reliably instead presses the “two-box” button; when this happens, he thinks he pressed the “one-box” button. Adam is surprised when he is given two boxes (one empty) because his beliefs are not fully hooked up to his perceptions and actions. But so long as the control panel is set up this way, it is correct to predict that Adam will two-box; so that is what the predictor will predict.
Here’s Bob. Bob is an adorable baby. He doesn’t read buttons or think about decision theory. But if you put him in a high-chair in front of a control panel, he will flail his hands at the buttons. Bob is left-handed, and flails his left hand further and more vigorously than his right. As a result, he reliably hits the “one-box” button first. The predictor determines that Bob has a disposition to one-box, and predicts accordingly.
Here’s the predictor’s buddy. He points out to the predictor that it’s weird that left-handed babies are way richer than right-handed babies. “What’s up with that? Do right-handed babies not like winning?”
The predictor shrugs and says, “I don’t care about ‘winning’ either. I just predict button pushes. It’s like content moderation, but with less trauma. I still think they’re gonna replace me with AI, though.”
The predictor’s buddy’s friend hears about this and gets really annoyed because his whole job is improving accessibility for financial user interfaces but he just got fired because the Niemöller counter reached “people with disabilities” a while ago. And he comes around and gives the predictor what-for about the biases encoded into the control panel.
“Yo,” says the predictor, “I didn’t build the dang buttons. I just work here; and they’ll fire me and replace me with another model if my accuracy drops. Look, I proposed replacing the buttons with a voice interface but they told me there’s this thing called a ‘tube ox’ and the voice model couldn’t tell the difference and started ordering oxygen tubes for everyone as if they were tungsten cubes or something. Adam’s a two-boxer because if I call him a one-boxer and he two-boxes, my ass is the one that gets fired.”
The predictor’s buddy’s friend’s pal drops in. “Wait a minute, I’m totally confused, is this like faith vs. works or something? Is the first guy called Adam because...”
Love this! Great examples to illustrate that your identity as a one boxer is rooted in your behavior instead of your mind. And pretty cool to think this mirrors theological debates that have gone on for so long
I’m not sure this changes the underlying disagreement. “desire to be predicted to one-box” is universal. Two-boxers deny the causal link from the prediction to the action. One boxers deny their own free will, and accept that the prediction must match the behavior.
Talk about “desire” or “intent” or “disposition” is only an attempt to elucidate that the full question is whether the prediction can actually be correct.
In other words, the results table below, the whole debate is whether X is reachable in the in-game universe (which may or may not map to our universe in terms of causality)
Content warning: silly.
Here’s Adam. Adam is an agent who believes in one-boxing. He knows arguments for one-boxing, and says them often. Unfortunately, Adam had a stroke last year, and has spatial neglect. His conscious reason does not correctly perceive the controls on the predictor’s machine. When he consciously intends to press the “one-box” button, he reliably instead presses the “two-box” button; when this happens, he thinks he pressed the “one-box” button. Adam is surprised when he is given two boxes (one empty) because his beliefs are not fully hooked up to his perceptions and actions. But so long as the control panel is set up this way, it is correct to predict that Adam will two-box; so that is what the predictor will predict.
Here’s Bob. Bob is an adorable baby. He doesn’t read buttons or think about decision theory. But if you put him in a high-chair in front of a control panel, he will flail his hands at the buttons. Bob is left-handed, and flails his left hand further and more vigorously than his right. As a result, he reliably hits the “one-box” button first. The predictor determines that Bob has a disposition to one-box, and predicts accordingly.
Here’s the predictor’s buddy. He points out to the predictor that it’s weird that left-handed babies are way richer than right-handed babies. “What’s up with that? Do right-handed babies not like winning?”
The predictor shrugs and says, “I don’t care about ‘winning’ either. I just predict button pushes. It’s like content moderation, but with less trauma. I still think they’re gonna replace me with AI, though.”
The predictor’s buddy’s friend hears about this and gets really annoyed because his whole job is improving accessibility for financial user interfaces but he just got fired because the Niemöller counter reached “people with disabilities” a while ago. And he comes around and gives the predictor what-for about the biases encoded into the control panel.
“Yo,” says the predictor, “I didn’t build the dang buttons. I just work here; and they’ll fire me and replace me with another model if my accuracy drops. Look, I proposed replacing the buttons with a voice interface but they told me there’s this thing called a ‘tube ox’ and the voice model couldn’t tell the difference and started ordering oxygen tubes for everyone as if they were tungsten cubes or something. Adam’s a two-boxer because if I call him a one-boxer and he two-boxes, my ass is the one that gets fired.”
The predictor’s buddy’s friend’s pal drops in. “Wait a minute, I’m totally confused, is this like faith vs. works or something? Is the first guy called Adam because...”
“NO!” said everyone else.
Love this! Great examples to illustrate that your identity as a one boxer is rooted in your behavior instead of your mind. And pretty cool to think this mirrors theological debates that have gone on for so long