In studying CEV, I am interested in methods built for learning a user’s utility function from inconsistent behavior … Nielsen & Jensen (2004) provided two computationally tractable algorithms which handle the problem by interpreting inconsistent behavior as random deviations from an underlying “true” utility function. As far as I know, however, nobody in AI has tried to solve the problem with an algorithm informed by the latest data from neuroeconomics on how human choice is the product of at least three valuation systems …
If better understanding of neuroscience seems potentially maybe somewhat useful in the long run, this kind of thing is clearly of no use. The only way to win is to eventually form accurate and technically precise understanding of human preference. Any progress that doesn’t potentially contribute to that is irrelevant, even if much easier to make and appears to study the same thing, but with no hope for the necessary precision (e.g. experimental psychology).
Having an imperfect utility function to hand (which can be improved upon later) might be useful if circumstances force FAI to be launched early.
No, it might not, since “imperfect” in this context essentially means “arbitrary”. When you’ve got a tiny target to hit from several miles away, it’s of no use to spend time practicing with a bow. And if the time comes when your life hangs upon successfully hitting the target, you don’t say that it’s useful that you have a master bowman at hand, you just die.
I was assuming that the imperfect utility functions would at least be accurate enough that they would assign preference to “not dying”. So your life wouldn’t depend on the choice of one of these utility functions vs. the other—it’s just that under the imperfect system, the world would be slightly suckier in a possibly non-obvious way.
Of course, the imperfect function would have to be subjected to some tests to make sure “slightly suckier” doesn’t equate to “extremely sucky” or “dead”. Obviously we don’t really know how to do that part yet.
To use the analogy, I’d expect that we might hit the target that way, just not the bullseye.
Inaccurate preference is a wish that you ask of a superintelligent genie (indifferent powerful outcome pump). The problem with wishes is that they get tested on all possible futures that the AI can implement, while you yourself rank their similarity to what you want only on the futures that you can (do) imagine. If there is but one highly implausible (to you) future that implements the wish a little bit better than others, that is the future that will happen, even if you would rank it as morally horrible. A wish that has too few points of contact with your preference has a lot of such futures within its rules.
That is the problem with the notion of similarity for AI wishes: it is brittle with respect to ability to pick out a single possible future that was unrepresentative in the way you ranked the similarity, and the criterion for which future actually gets selected doesn’t care about what was representative to you.
I think you can assign a low preference ranking to “everything that I can’t imagine”. (Obviously that would limit the range of possible futures quite a bit though).
In general though, there are (among others) two risks in any value discovery project:
You don’t get your results in time
You end up missing something that you value
Running multiple approaches in parallel would seem to mitigate both of those risks somewhat.
I agree that a neuroscience-based approach feels the least likely to miss out any values, since presumably everything you value is stored in your brain somehow. There are still possibilities for bugs in the extrapolation/aggregation stage though.
If there is but one highly implausible (to you) future that implements the wish a little bit better than others, that is the future that will happen, even if you would rank it as morally horrible. A wish that has too few points of contact with your preference has a lot of such futures within its rules.
If better understanding of neuroscience seems potentially maybe somewhat useful in the long run, this kind of thing is clearly of no use. The only way to win is to eventually form accurate and technically precise understanding of human preference. Any progress that doesn’t potentially contribute to that is irrelevant, even if much easier to make and appears to study the same thing, but with no hope for the necessary precision (e.g. experimental psychology).
Having an imperfect utility function to hand (which can be improved upon later) might be useful if circumstances force FAI to be launched early.
No, it might not, since “imperfect” in this context essentially means “arbitrary”. When you’ve got a tiny target to hit from several miles away, it’s of no use to spend time practicing with a bow. And if the time comes when your life hangs upon successfully hitting the target, you don’t say that it’s useful that you have a master bowman at hand, you just die.
I was assuming that the imperfect utility functions would at least be accurate enough that they would assign preference to “not dying”. So your life wouldn’t depend on the choice of one of these utility functions vs. the other—it’s just that under the imperfect system, the world would be slightly suckier in a possibly non-obvious way.
Of course, the imperfect function would have to be subjected to some tests to make sure “slightly suckier” doesn’t equate to “extremely sucky” or “dead”. Obviously we don’t really know how to do that part yet.
To use the analogy, I’d expect that we might hit the target that way, just not the bullseye.
Inaccurate preference is a wish that you ask of a superintelligent genie (indifferent powerful outcome pump). The problem with wishes is that they get tested on all possible futures that the AI can implement, while you yourself rank their similarity to what you want only on the futures that you can (do) imagine. If there is but one highly implausible (to you) future that implements the wish a little bit better than others, that is the future that will happen, even if you would rank it as morally horrible. A wish that has too few points of contact with your preference has a lot of such futures within its rules.
That is the problem with the notion of similarity for AI wishes: it is brittle with respect to ability to pick out a single possible future that was unrepresentative in the way you ranked the similarity, and the criterion for which future actually gets selected doesn’t care about what was representative to you.
I think you can assign a low preference ranking to “everything that I can’t imagine”. (Obviously that would limit the range of possible futures quite a bit though).
In general though, there are (among others) two risks in any value discovery project:
You don’t get your results in time
You end up missing something that you value
Running multiple approaches in parallel would seem to mitigate both of those risks somewhat.
I agree that a neuroscience-based approach feels the least likely to miss out any values, since presumably everything you value is stored in your brain somehow. There are still possibilities for bugs in the extrapolation/aggregation stage though.
X-Files, Je Souhaite: