In studying CEV, I am interested in methods built for learning a user’s utility function from inconsistent behavior (because humans make inconsistent choices).
I’ve always interpreted CEV as saying: simulate a smarter version of me, then ask it what its utility function is. I don’t see why looking at people’s behavior is either part of CEV as written, or a good idea. Am I missing something?
I’ll give the same response to you and to Vladimir Nesov:
I don’t know what is going to be required for CEV, so I’m hacking away at the edges. Hopefully, little bits of progress even on wrong paths will inform our intuitions about what the right paths are.
Well, if you aren’t, I certainly am. (Of course, if you are, I undoubtedly am as well.)
So, OK, the AI simulates a smarter version of me… call it Dave2. There’s three relevant possibilities:
Dave2 has a utility function U() that has some poorly-understood-but-important relationship to my volition and will fully articulate U() when asked. In this case, simulating asking Dave2 what its utility function and attending to its simulated answer might be of value in figuring out the right thing for the AI to do next.
Dave2 has U() but won’t fully articulate U() when asked. In this case, attending to Dave2′s simulated answer might be less valuable in figuring out the right thing for the AI to do next than attending to Dave2′s expressed preferences.
Dave2 lacks U(), either because it doesn’t have a utility function at all, or because it turns out that the parameters the AI used to create Dave2 resulted in Dave2′s utility function lacking said relationship to my volition, or for some other reason. In this case, it’s not clear that any operation performed on Dave2 is of any value in figuring out the right thing for the AI to do next
Said more simply: maybe I ask it what its utility function is, maybe I infer its utility function from its behavior, and maybe the whole idea is muddle-headed.
It seems to me you’re confident that only the first of those is plausible… can you expand on your reasons for believing that, if you in fact do?
(nods) Yeah. I take it for granted that there are multiple ways to create the “smarter version” steven0461 was referring to, since the alternative seems implausibly neat, and that it’s therefore (hypothetically) up to the AI to figure out how to create a Dave2 whose utterances have the desired value.
Of course, if we live in a convenient universe where there’s only one possible “extrapolated Dave,” or at least an obviously superior candidate (which of course opens a whole infinite regress problem: how do I build a system I trust to decide which of the many possible simulations of my improved self it should use in order to determine what I would want if I were better? And if I somehow can trust it to do that much in an ethical fashion, haven’t I already solved the automated ethics problem? What work is left for CEV to do?)
In the less convenient worlds, the idea of averaging all the possible extrapolated mes into a weighted vector sum, along with all the possible extrapolated everyone elses, had not occurred to me, but is better than anything else I can think of.
I’ve always interpreted CEV as saying: simulate a smarter version of me, then ask it what its utility function is. I don’t see why looking at people’s behavior is either part of CEV as written, or a good idea. Am I missing something?
I’ll give the same response to you and to Vladimir Nesov:
I don’t know what is going to be required for CEV, so I’m hacking away at the edges. Hopefully, little bits of progress even on wrong paths will inform our intuitions about what the right paths are.
Rightly so
Well, if you aren’t, I certainly am. (Of course, if you are, I undoubtedly am as well.)
So, OK, the AI simulates a smarter version of me… call it Dave2. There’s three relevant possibilities:
Dave2 has a utility function U() that has some poorly-understood-but-important relationship to my volition and will fully articulate U() when asked. In this case, simulating asking Dave2 what its utility function and attending to its simulated answer might be of value in figuring out the right thing for the AI to do next.
Dave2 has U() but won’t fully articulate U() when asked. In this case, attending to Dave2′s simulated answer might be less valuable in figuring out the right thing for the AI to do next than attending to Dave2′s expressed preferences.
Dave2 lacks U(), either because it doesn’t have a utility function at all, or because it turns out that the parameters the AI used to create Dave2 resulted in Dave2′s utility function lacking said relationship to my volition, or for some other reason. In this case, it’s not clear that any operation performed on Dave2 is of any value in figuring out the right thing for the AI to do next
Said more simply: maybe I ask it what its utility function is, maybe I infer its utility function from its behavior, and maybe the whole idea is muddle-headed.
It seems to me you’re confident that only the first of those is plausible… can you expand on your reasons for believing that, if you in fact do?
Also, what exactly means “smarter version”? Is there only one way to make a smarter version, or are they many possible smarter versions?
What if the smarter versions have different utility functions—should AI take some weighted average of their functions?
(nods) Yeah. I take it for granted that there are multiple ways to create the “smarter version” steven0461 was referring to, since the alternative seems implausibly neat, and that it’s therefore (hypothetically) up to the AI to figure out how to create a Dave2 whose utterances have the desired value.
Of course, if we live in a convenient universe where there’s only one possible “extrapolated Dave,” or at least an obviously superior candidate (which of course opens a whole infinite regress problem: how do I build a system I trust to decide which of the many possible simulations of my improved self it should use in order to determine what I would want if I were better? And if I somehow can trust it to do that much in an ethical fashion, haven’t I already solved the automated ethics problem? What work is left for CEV to do?)
In the less convenient worlds, the idea of averaging all the possible extrapolated mes into a weighted vector sum, along with all the possible extrapolated everyone elses, had not occurred to me, but is better than anything else I can think of.