There is a situation I’ve thought of in which functional decision theory, according to my understanding of it, does poorly. I might just be making some sort of mistake, but I tried to be pretty careful when reading Eliezer’s paper on functional decision theory, and it still seems to be a problem. I’m interested in what others think of this.
The situation is just like Newcomb’s problem, except that player is a superintelligent AI who is aware of the exact mental makeup of the predictor (the one considering placing things in the boxes) and can infer the predictor’s choices with what is effectively certainty. The predictor is just a regular, non-superintelligent creature that knows the superintelligence uses function decision theory and is superintelligent.
In this situation, there doesn’t seem to be a logical connection between what the superintelligent outputs and the its prediction of what the predictor does. I mean, the superintelligence can exactly infer what the predictor does, without referencing its own action, so it doesn’t seem like the superintelligence knowing what it itself does would really by informative.
So, suppose the superintelligence predicts the predictor puts money in both boxes, and would believe this prediction no matter what the superintelligence decides to do. In this situation, I don’t see any reason for it to not take two boxes.
And the predictor reasons, “Since the AI can predict my exact output with effective certainty, eliminating a logical connection between its choice and the content of the box, the AI has no reason not to just take two boxes if they contain money. So, I predict the AI would two-box, so I won’t put money in either box.”
And then the superintelligent AI gets $0.
I don’t think that’s what a superintelligence is supposed to get. And if the AI didn’t have the knowledge and power to predict the predictor’s output exactly, then a logical connection could have be preserved and the AI could potentially get money. But I don’t think more knowledge and more intelligence are supposed to make agents do worse.
This is known as “agent simulates predictor”. There has been plenty of discussion of this problem. I’m currently feeling too lazy to try to summarize or link all the approaches, but here are some thoughts I had about it via my infra-Bayesian theory.
There is a situation I’ve thought of in which functional decision theory, according to my understanding of it, does poorly. I might just be making some sort of mistake, but I tried to be pretty careful when reading Eliezer’s paper on functional decision theory, and it still seems to be a problem. I’m interested in what others think of this.
The situation is just like Newcomb’s problem, except that player is a superintelligent AI who is aware of the exact mental makeup of the predictor (the one considering placing things in the boxes) and can infer the predictor’s choices with what is effectively certainty. The predictor is just a regular, non-superintelligent creature that knows the superintelligence uses function decision theory and is superintelligent.
In this situation, there doesn’t seem to be a logical connection between what the superintelligent outputs and the its prediction of what the predictor does. I mean, the superintelligence can exactly infer what the predictor does, without referencing its own action, so it doesn’t seem like the superintelligence knowing what it itself does would really by informative.
So, suppose the superintelligence predicts the predictor puts money in both boxes, and would believe this prediction no matter what the superintelligence decides to do. In this situation, I don’t see any reason for it to not take two boxes.
And the predictor reasons, “Since the AI can predict my exact output with effective certainty, eliminating a logical connection between its choice and the content of the box, the AI has no reason not to just take two boxes if they contain money. So, I predict the AI would two-box, so I won’t put money in either box.”
And then the superintelligent AI gets $0.
I don’t think that’s what a superintelligence is supposed to get. And if the AI didn’t have the knowledge and power to predict the predictor’s output exactly, then a logical connection could have be preserved and the AI could potentially get money. But I don’t think more knowledge and more intelligence are supposed to make agents do worse.
This is known as “agent simulates predictor”. There has been plenty of discussion of this problem. I’m currently feeling too lazy to try to summarize or link all the approaches, but here are some thoughts I had about it via my infra-Bayesian theory.