Here’s Adam. Adam is an agent who believes in one-boxing. He knows arguments for one-boxing, and says them often. Unfortunately, Adam had a stroke last year, and has spatial neglect. His conscious reason does not correctly perceive the controls on the predictor’s machine. When he consciously intends to press the “one-box” button, he reliably instead presses the “two-box” button; when this happens, he thinks he pressed the “one-box” button. Adam is surprised when he is given two boxes (one empty) because his beliefs are not fully hooked up to his perceptions and actions. But so long as the control panel is set up this way, it is correct to predict that Adam will two-box; so that is what the predictor will predict.
Here’s Bob. Bob is an adorable baby. He doesn’t read buttons or think about decision theory. But if you put him in a high-chair in front of a control panel, he will flail his hands at the buttons. Bob is left-handed, and flails his left hand further and more vigorously than his right. As a result, he reliably hits the “one-box” button first. The predictor determines that Bob has a disposition to one-box, and predicts accordingly.
Here’s the predictor’s buddy. He points out to the predictor that it’s weird that left-handed babies are way richer than right-handed babies. “What’s up with that? Do right-handed babies not like winning?”
The predictor shrugs and says, “I don’t care about ‘winning’ either. I just predict button pushes. It’s like content moderation, but with less trauma. I still think they’re gonna replace me with AI, though.”
The predictor’s buddy’s friend hears about this and gets really annoyed because his whole job is improving accessibility for financial user interfaces but he just got fired because the Niemöller counter reached “people with disabilities” a while ago. And he comes around and gives the predictor what-for about the biases encoded into the control panel.
“Yo,” says the predictor, “I didn’t build the dang buttons. I just work here; and they’ll fire me and replace me with another model if my accuracy drops. Look, I proposed replacing the buttons with a voice interface but they told me there’s this thing called a ‘tube ox’ and the voice model couldn’t tell the difference and started ordering oxygen tubes for everyone as if they were tungsten cubes or something. Adam’s a two-boxer because if I call him a one-boxer and he two-boxes, my ass is the one that gets fired.”
The predictor’s buddy’s friend’s pal drops in. “Wait a minute, I’m totally confused, is this like faith vs. works or something? Is the first guy called Adam because...”
Love this! Great examples to illustrate that your identity as a one boxer is rooted in your behavior instead of your mind. And pretty cool to think this mirrors theological debates that have gone on for so long
Content warning: silly.
Here’s Adam. Adam is an agent who believes in one-boxing. He knows arguments for one-boxing, and says them often. Unfortunately, Adam had a stroke last year, and has spatial neglect. His conscious reason does not correctly perceive the controls on the predictor’s machine. When he consciously intends to press the “one-box” button, he reliably instead presses the “two-box” button; when this happens, he thinks he pressed the “one-box” button. Adam is surprised when he is given two boxes (one empty) because his beliefs are not fully hooked up to his perceptions and actions. But so long as the control panel is set up this way, it is correct to predict that Adam will two-box; so that is what the predictor will predict.
Here’s Bob. Bob is an adorable baby. He doesn’t read buttons or think about decision theory. But if you put him in a high-chair in front of a control panel, he will flail his hands at the buttons. Bob is left-handed, and flails his left hand further and more vigorously than his right. As a result, he reliably hits the “one-box” button first. The predictor determines that Bob has a disposition to one-box, and predicts accordingly.
Here’s the predictor’s buddy. He points out to the predictor that it’s weird that left-handed babies are way richer than right-handed babies. “What’s up with that? Do right-handed babies not like winning?”
The predictor shrugs and says, “I don’t care about ‘winning’ either. I just predict button pushes. It’s like content moderation, but with less trauma. I still think they’re gonna replace me with AI, though.”
The predictor’s buddy’s friend hears about this and gets really annoyed because his whole job is improving accessibility for financial user interfaces but he just got fired because the Niemöller counter reached “people with disabilities” a while ago. And he comes around and gives the predictor what-for about the biases encoded into the control panel.
“Yo,” says the predictor, “I didn’t build the dang buttons. I just work here; and they’ll fire me and replace me with another model if my accuracy drops. Look, I proposed replacing the buttons with a voice interface but they told me there’s this thing called a ‘tube ox’ and the voice model couldn’t tell the difference and started ordering oxygen tubes for everyone as if they were tungsten cubes or something. Adam’s a two-boxer because if I call him a one-boxer and he two-boxes, my ass is the one that gets fired.”
The predictor’s buddy’s friend’s pal drops in. “Wait a minute, I’m totally confused, is this like faith vs. works or something? Is the first guy called Adam because...”
“NO!” said everyone else.
Love this! Great examples to illustrate that your identity as a one boxer is rooted in your behavior instead of your mind. And pretty cool to think this mirrors theological debates that have gone on for so long