Let’s suppose that if you believe that when you believe you have a chance X to succeed, you actually have a chance 0.75 X to succeed (because you can’t stop your beliefs from influencing your behavior). The winning strategy seems to believe in 100% success, and thus succeed in 75% of cases. On the other hand, trying too much to find a value of X which brings exact predictions, would bring one to believing in 0% success… and being right about it. So in this (not so artificial!) situation, a rationalist should prefer success to being right.
But in real life, unexpected things happen. Imagine that you somehow reprogram yourself to genuinely believe that you have 100% of success… and then someone comes and offers you a bet: you win $100 if you succeed, and lose $10000 if you fail. In you genuinely believe in 100% success, this seems like an offer of free money, so you take the bet. Which you probably shouldn’t.
For an AI, a possible solution could be this: Run your own simulation. Make this simulation believe that the chance of success is 100%, while you know that it really is 75%. Give the simulation access to all inputs and outputs, and just let it work. Take control back when the task is completed, or when something very unexpected happens. -- The only problem is to balance the right level of “unexpected”; to know the difference between random events that belong to the task, and the random events outside of the initially expected scenario.
I suppose evolution gave us similar skills, though not so precisely defined as in the case of AI. An AI simulating itself would need twice as much memory and time; instead of this, humans use compartmentalization as an efficient heuristic. Instead of having one personality that believes in 100% success, and another that believes in 75%, human just convices themselves that the chance of success is 100%, but prevents this belief from propagating too far, so they can take the benefits of the imaginary belief, while avoiding some of its costs. This heuristic is a net advantage, though sometimes it fails, and other people may be able to exploit it: to use your own illusions to bring you to a logical decision that you should take the bet, while avoiding a suspicion of something unusual. -- In this situation there is no original AI which could take over control, so this strategy of false beliefs is accompanied by a rule “if there is something very unusual, avoid it, even if it logically seems like the right thing to do”. It means to not trust your own logic, which in a given situation is very reasonable.
I do this every day, correctly predicting I’ll never succeed at stuff and not getting placebo benefits. Don’t dare try compartmentalization or self delusion for the reasons Eliezer has outlined. Some other complicating factors. Big problem for me.
Be careful of this sort of argument, any time you find yourself defining the “winner” as someone other than the agent who is currently smiling from on top of a giant heap of utility.
Yea, I know that, but I’m not convinced fooling myself wont result in something even worse. Better ineffectively doing good than effectively doing evil.
Let’s suppose that if you believe that when you believe you have a chance X to succeed, you actually have a chance 0.75 X to succeed (because you can’t stop your beliefs from influencing your behavior). The winning strategy seems to believe in 100% success, and thus succeed in 75% of cases. On the other hand, trying too much to find a value of X which brings exact predictions, would bring one to believing in 0% success… and being right about it. So in this (not so artificial!) situation, a rationalist should prefer success to being right.
But in real life, unexpected things happen. Imagine that you somehow reprogram yourself to genuinely believe that you have 100% of success… and then someone comes and offers you a bet: you win $100 if you succeed, and lose $10000 if you fail. In you genuinely believe in 100% success, this seems like an offer of free money, so you take the bet. Which you probably shouldn’t.
For an AI, a possible solution could be this: Run your own simulation. Make this simulation believe that the chance of success is 100%, while you know that it really is 75%. Give the simulation access to all inputs and outputs, and just let it work. Take control back when the task is completed, or when something very unexpected happens. -- The only problem is to balance the right level of “unexpected”; to know the difference between random events that belong to the task, and the random events outside of the initially expected scenario.
I suppose evolution gave us similar skills, though not so precisely defined as in the case of AI. An AI simulating itself would need twice as much memory and time; instead of this, humans use compartmentalization as an efficient heuristic. Instead of having one personality that believes in 100% success, and another that believes in 75%, human just convices themselves that the chance of success is 100%, but prevents this belief from propagating too far, so they can take the benefits of the imaginary belief, while avoiding some of its costs. This heuristic is a net advantage, though sometimes it fails, and other people may be able to exploit it: to use your own illusions to bring you to a logical decision that you should take the bet, while avoiding a suspicion of something unusual. -- In this situation there is no original AI which could take over control, so this strategy of false beliefs is accompanied by a rule “if there is something very unusual, avoid it, even if it logically seems like the right thing to do”. It means to not trust your own logic, which in a given situation is very reasonable.
I do this every day, correctly predicting I’ll never succeed at stuff and not getting placebo benefits. Don’t dare try compartmentalization or self delusion for the reasons Eliezer has outlined. Some other complicating factors. Big problem for me.
(from “Newcomb’s Problem and Regret of Rationality”)
Yea, I know that, but I’m not convinced fooling myself wont result in something even worse. Better ineffectively doing good than effectively doing evil.