[link] Thoughts on defining human preferences

Kaj_Sotala31 Mar 2015 10:08 UTC

9 points

https://docs.google.com/document/d/1jDGpIT3gKZQZByO6A036dojRKMv62KEDEfEz87VuDoY/

Abstract: Discussion of how we might want to define human preferences, particularly in the context of building an AI intended to learn and implement those preferences. Starts with actual arguments about the applicability of the VNM utility theorem, then towards the end gets into hypotheses that are less well defended but possibly more important. At the very end, suggests that current hypothesizing about AI safety might be overemphasizing “discovering our preferences” over “creating our preferences”.

Kaj_Sotala31 Mar 2015 10:08 UTC

9 points

5 comments1 min readLW link Archive

torekp 2 Apr 2015 1:47 UTC
2 points
0
I like a lot of this paper. I disagree with the extent and scope of your “personal hypothesis” in section 8. I just think that, as a matter of empirical fact, there usually is something we “really want”, or would want with more rationality and information. For most people facing your country vs city living decision, for example, I think that trying both lifestyles would lead to a clear winner (and not in a radically path-dependent way).

But I think you’ve got to be right at least to some extent—sometimes we have to create values, we can’t just discover them.

On VNM rationality, let me recommend the book Decision Theory and Rationality by Bermúdez. He raises some similar worries as you, pointing out that decision theory is often offered as (a) a predictive tool, (b) a prescription (follow this recipe to make better decisions!), and/or (c) a normative theory. He then claims that the same theory can’t do all three; yet if it tries to make do with less than all three, that raises troubles too.

Thanks for the post.
- Kaj_Sotala 4 Apr 2015 18:44 UTC
  0 points
  0
  Parent
  Thanks for the recommendation!
djm 1 Apr 2015 1:34 UTC
0 points
0

My personal hypothesis is that all human preferences are transient preferences, produced as flashes of positive or negative affect towards some mental concept.

I like that. People do change preferences, a lot—there was that [not very accurate] quote saying (US Centric) along the lines of “If you under 25 and vote republic, you have no heart” “If you over 25 and vote liberal you have no brains”

The most difficult part of this is that people have ingrained beliefs and preferences that will make them unhappy if the other side is picked—rational or not, we cant pick preferences that make all people happy.

For example

Group1 prefers more taxes for social services and hates social injustice

Group2 hates higher taxes

So for Group1 to be happy the Tax_rate should be between 0.2 → 0.4

but this makes Group2 unhappy as their preferred tax rate is between 0.05 → 0.15

There is no value in tax_rate that makes all groups happy.

Even if this were solved with infiniate social services and zero tax rates the million other disparate preferences—whether rational or not would cause bigger issues (religeous | athiest, vi | emacs, etc)
[deleted] 31 Mar 2015 21:45 UTC
0 points
0
Not sure I completely understand, but is this a possibility under your proposal?

AI: “Voila!”

Humans: “Uh, that’s not what we want.”

AI: “No, but it’s what you want to want.”

Hopefully the AI makes us want what we want to want. Then we’d say…

Humans: “Yay! Broccoli for dinner again!”
- RomeoStevens 1 Apr 2015 23:00 UTC
  0 points
  0
  Parent
  In the present world we are forced, due to resource constraints, between virtuous values and hedonistic values. It isn’t that we want to force ourselves to choose the virtuous, we want to dissolve the need to choose between them so we can satisfy both.