What I meant was is that the AI would keep inside it a predicate Will_Pearson_would_regret_wish (based on what I would regret), and apply that to the universes it envisages while planning. A metaphor for what I mean is the AI telling a virtual copy of me all the stories of the future, from various view points, and the virtual me not regretting the wish. Of course I would expect it to be able to distill a non sentient version of the regret predicate.
So if it invented a scenario where it killed the real me, the predicate would still exist and say false. It would be able to predict this, and so not carry out this plan.
If you want to, generalize to humanity. This is not quite the same as CEV, as the AI is not trying to figure out what we want when we would be smarter, but what we don’t want when we are dumb. Call it coherent no regret, if you wish.
CNR might be equivalent of CEV if humanity wishes not to feel regret in the future for the choice. That is if we would regret being in a future where people regret the decision, even though current people wouldn’t.
So let’s suppose we’ve created a perfect zombie simulation!Will. A few immediate problems:
A human being is not capable of understanding every situation. If we modified the simulation of you so that it could understand any situation an AI could conceive of, we would in the process radically alter the psychology of simulation!Will. How do we know what cognitive dispositions of simulation!Will to change, and what dispositions not to change, in order to preserve the ‘real Will’ (i.e., an authentic representation of what you would have meant by ‘Will Pearson would regret wish’) in the face of a superhuman enhancement? You might intuit that it’s possible to simply expand your information processing capabilities without altering who you ‘really are,’ but real-world human psychology is complex, and our reasoning and perceiving faculties are not in reality wholly divorceable from our personality.
We can frame the problem as a series of dilemmas: We can either enhance simulated!Will with a certain piece of information (which may involve fundamentally redesigning simulated!Will to have inhuman information-processing and reasoning capacities), or we can leave simulated!Will in the dark on this information, on the grounds that the real Will wouldn’t have been willing or able to factor it into his decision. (But the ‘able’ bit seems morally irrelevant—a situation may be morally good or bad even if a human cannot compute the reason or justification for that status. And the ‘willing’ seems improbable, and hard to calculate; how do we go about creating a simulation of whether Will would want us to modify simulated!Will in a given way, unless Will could fairly evaluate the modification itself without yet being capable of evaluating some of its consequences? How do we know in advance whether this modification is in excess of what Will would have wanted, if we cannot create a Will that both possesses the relevant knowledge and is untampered-with?)
Along similar lines, we can ask: Does mere exposure to certain facts unfairly dispose Will to choose certain policies the AI wants, even without redrafting the fundamental architecture of Will’s cognition? In other words, can an AI brainwash its simulated!Will by exposing simulated!Will specifically to the true information it knows would cause Will to assent to whatever proposition the AI wants? Humans are irrational, so we should expect there to be ‘hacks’ of this sort in any reasonable model; and since our biases are not discrete, i.e., it is not always possible to cleanly distinguish a biased decision from an unbiased one, the AI might not even be capable of determining whether it is brainwashing or unfairly influencing simulated!Will as opposed to merely informing or educating simulated!Will.
More generally: People can be wrong about what optimizes for their values. simulated!Will may perfectly reflect what Will would think, but not what would actually produce the most well-being for Will. I can be completely convinced that a certain situation optimizes for my values, and be wrong. But it is not an easy task to isolate my values (my ‘true’ preferences) from my stated preferences; certainly simulated!Will himself will not be an inerrant guide to this distinction. So this is a problem both for knowing how to build the simulation (i.e., what traits to exclude or include), and for how to assess when we’re done whether the simulation is serving as a useful guide to what Will actually prefers, as opposed to just being a guide to what Will thinks he prefers.
Bogdan Butnaru:
What I meant was is that the AI would keep inside it a predicate Will_Pearson_would_regret_wish (based on what I would regret), and apply that to the universes it envisages while planning. A metaphor for what I mean is the AI telling a virtual copy of me all the stories of the future, from various view points, and the virtual me not regretting the wish. Of course I would expect it to be able to distill a non sentient version of the regret predicate.
So if it invented a scenario where it killed the real me, the predicate would still exist and say false. It would be able to predict this, and so not carry out this plan.
If you want to, generalize to humanity. This is not quite the same as CEV, as the AI is not trying to figure out what we want when we would be smarter, but what we don’t want when we are dumb. Call it coherent no regret, if you wish.
CNR might be equivalent of CEV if humanity wishes not to feel regret in the future for the choice. That is if we would regret being in a future where people regret the decision, even though current people wouldn’t.
So let’s suppose we’ve created a perfect zombie simulation!Will. A few immediate problems:
A human being is not capable of understanding every situation. If we modified the simulation of you so that it could understand any situation an AI could conceive of, we would in the process radically alter the psychology of simulation!Will. How do we know what cognitive dispositions of simulation!Will to change, and what dispositions not to change, in order to preserve the ‘real Will’ (i.e., an authentic representation of what you would have meant by ‘Will Pearson would regret wish’) in the face of a superhuman enhancement? You might intuit that it’s possible to simply expand your information processing capabilities without altering who you ‘really are,’ but real-world human psychology is complex, and our reasoning and perceiving faculties are not in reality wholly divorceable from our personality.
We can frame the problem as a series of dilemmas: We can either enhance simulated!Will with a certain piece of information (which may involve fundamentally redesigning simulated!Will to have inhuman information-processing and reasoning capacities), or we can leave simulated!Will in the dark on this information, on the grounds that the real Will wouldn’t have been willing or able to factor it into his decision. (But the ‘able’ bit seems morally irrelevant—a situation may be morally good or bad even if a human cannot compute the reason or justification for that status. And the ‘willing’ seems improbable, and hard to calculate; how do we go about creating a simulation of whether Will would want us to modify simulated!Will in a given way, unless Will could fairly evaluate the modification itself without yet being capable of evaluating some of its consequences? How do we know in advance whether this modification is in excess of what Will would have wanted, if we cannot create a Will that both possesses the relevant knowledge and is untampered-with?)
Along similar lines, we can ask: Does mere exposure to certain facts unfairly dispose Will to choose certain policies the AI wants, even without redrafting the fundamental architecture of Will’s cognition? In other words, can an AI brainwash its simulated!Will by exposing simulated!Will specifically to the true information it knows would cause Will to assent to whatever proposition the AI wants? Humans are irrational, so we should expect there to be ‘hacks’ of this sort in any reasonable model; and since our biases are not discrete, i.e., it is not always possible to cleanly distinguish a biased decision from an unbiased one, the AI might not even be capable of determining whether it is brainwashing or unfairly influencing simulated!Will as opposed to merely informing or educating simulated!Will.
More generally: People can be wrong about what optimizes for their values. simulated!Will may perfectly reflect what Will would think, but not what would actually produce the most well-being for Will. I can be completely convinced that a certain situation optimizes for my values, and be wrong. But it is not an easy task to isolate my values (my ‘true’ preferences) from my stated preferences; certainly simulated!Will himself will not be an inerrant guide to this distinction. So this is a problem both for knowing how to build the simulation (i.e., what traits to exclude or include), and for how to assess when we’re done whether the simulation is serving as a useful guide to what Will actually prefers, as opposed to just being a guide to what Will thinks he prefers.