The Preference Utilitarian’s Time Inconsistency Problem

Wei Dai15 Jan 2010 0:26 UTC

34 points

In May of 2007, DanielLC asked at Felicifa, an “online utilitarianism community”:

If preference utilitarianism is about making peoples’ preferences and the universe coincide, wouldn’t it be much easier to change peoples’ preferences than the universe?

Indeed, if we were to program a super-intelligent AI to use the utility function U(w) = sum of w’s utilities according to people (i.e., morally relevant agents) who exist in world-history w, the AI might end up killing everyone who is alive now and creating a bunch of new people whose preferences are more easily satisfied, or just use its super intelligence to persuade us to be more satisfied with the universe as it is.

Well, that can’t be what we want. Is there an alternative formulation of preference utilitarianism that doesn’t exhibit this problem? Perhaps. Suppose we instead program the AI to use U’(w) = sum of w’s utilities according to people who exist at the time of decision. This solves the Daniel’s problem, but introduces a new one: time inconsistency.

The new AI’s utility function depends on who exists at the time of decision, and as that time changes and people are born and die, its utility function also changes. If the AI is capable of reflection and self-modification, it should immediately notice that it would maximize its expected utility, according to its current utility function, by modifying itself to use U’’(w) = sum of w’s utilities according to people who existed at time T₀, where T₀ is a constant representing the time of self-modification.

The AI is now reflectively consistent, but is this the right outcome? Should the whole future of the universe be shaped only by the preferences of those who happen to be alive at some arbitrary point in time? Presumably, if you’re a utilitarian in the first place, this is probably not the kind of utilitarianism that you’d want to subscribe to.

So, what is the solution to this problem? Robin Hanson’s approach to moral philosophy may work. It tries to take into account everyone’s preferences—those who lived in the past, those who will live in the future, and those who have the potential to exist but don’t—but I don’t think he has worked out (or written down) the solution in detail. For example, is the utilitarian AI supposed to sum over every logically possible utility function and weigh them equally? If not, what weighing scheme should it use?

Perhaps someone can follow up Robin’s idea and see where this approach leads us? Or does anyone have other ideas for solving this time inconsistency problem?

What links here?

Wei Dai15 Jan 2010 0:26 UTC

34 points

107 comments1 min readLW link Archive

Utility Functions Consequentialism

wedrifid 15 Jan 2010 2:03 UTC
19 points

The AI is now reflectively consistent, but is this the right outcome?

Yes.

Should the whole future of the universe be shaped only by the preferences of those who happen to be alive at some arbitrary point in time?

‘Should’? Us deciding what should be is already us pretending, hoping or otherwise counterfactually assuming for the purposes of discussion that we can choose the fate of the universe. It so happens that many people that happen to be alive at this arbitrary point in time have preferences with altruistic components that could consider future agents. Lucky them, assuming these arbitrary agents get their way.

Presumably, if you’re a utilitarian in the first place, this is probably not the kind of utilitarianism that you’d want to subscribe to.

That may explain my disagreement (or, as phrased, unexpected agreement). I tend to consider utilitarianism (as typically described) to be naive, verging on silly. The U” option you describe at least seems to have the coherency required to be implemented in practice without a catastrophic or absurd result.
- Wei Dai 15 Jan 2010 9:17 UTC
  3 points
  Parent
  Since you found our agreement unexpected, it may give you a better perspective on this post to know that while it’s mostly addressed to utilitarians, I’m not a utilitarian myself. I do have a certain amount of intellectual sympathy towards utilitarianism, and would like to see its most coherent positions, and hear its strongest arguments, so my post was written in that spirit.
  
  I’d also be quite interested in exploring other potentially viable approaches to moral philosophy. Given that you consider utilitarianism to be naive and verging on silly, what approaches do you find promising?
  - wedrifid 16 Jan 2010 0:52 UTC
    7 points
    Parent
    
    Since you found our agreement unexpected
    
    Let’s say I agree with the specific statements, which would be unexpected by the context if I were a utilitarian. I wouldn’t dream of accusing you of being a utilitarian given how much of an insult that would be given my position.
    
    I’d also be quite interested in exploring other potentially viable approaches to moral philosophy. Given that you consider utilitarianism to be naive and verging on silly, what approaches do you find promising?
    
    “The universe should be made to maximise my utility (best satisfy my preferences over possible states of the universe) ” is my moral philosophy. From that foundation altruism and cooperation considerations come into play. Except that some people define that as not a moral philosophy.
    - timtyler 3 Jun 2010 23:36 UTC
      7 points
      Parent
      That seems to be this: http://en.wikipedia.org/wiki/Ethical_egoism—which I would classify as some kind of moral philosophy.
      
      It seems to be much more biologically realistic than utilitariaism. Utilitarianism appears to be an ethical system based on clearly signalling how unselfish and nice you are. The signal seems somewhat tarnished by being pretty unbelievable, though. Do these people really avoid nepotism and favouring themselves? Or are they kidding themselves about their motives in the hope of deceiving others?
    - amcknight 19 Nov 2011 1:41 UTC
      0 points
      Parent
      It sounds silly and arbitrary when you discharge the references:
      
      “The universe should be made to maximise widrifid’s utility (best satisfy widrifid’s preferences over possible states of the universe)”
      
      Why not replace “widfirid” with “amcknight”? The fact that you happen to be yourself doesn’t sound like a good enough reason.
      - pedanterrific 19 Nov 2011 1:55 UTC
        7 points
        Parent
        Is there some reason why moral philosophy can’t be arbitrary?
        amcknight 21 Nov 2011 20:34 UTC
        2 points
        Parent
        Yes. If you want your beliefs to pay rent, then you need to choose between features of reality rather than simply choose arbitrarily. Is there anything else that you believe arbitrarily? Why make an exception for moral philosophy? Reminds me of Status Quo Bias or keeping faith even after learning about other religions. Can you name a relevant difference?
      - wedrifid 19 Nov 2011 3:18 UTC
        4 points
        Parent
        
        It sounds silly and arbitrary when you discharge the references:
        
        That sounds like a good description of moralizing to me!
  - timtyler 3 Jun 2010 23:29 UTC
    0 points
    Parent
    http://en.wikipedia.org/wiki/Utilitarianism#Criticism_and_defense goes over some of the common issues.
mattnewport 15 Jan 2010 1:59 UTC
8 points
Given that most attempts at thinking through the consequences of utilitarian ethics resemble a proof by contradiction that utilitarianism cannot be a good basis for ethics it surprises me how many people continue to embrace it and try to fix it.
- magfrump 15 Jan 2010 17:59 UTC
  1 point
  Parent
  Can you provide a link to an academic paper or blog post that discusses this in more depth?
  - Jack 16 Jan 2010 6:40 UTC
    0 points
    Parent
    The kind of thought experiments (I think) Matt is referring to are so basic I don’t know of any papers that go into them in depth. They get discussed in intro level ethics courses. For example: A white woman is raped and murdered in segregation era deep south. Witnesses say the culprit was black. Tensions are high and there is a high likelihood race riots break out and whites just start killing blacks. Hundreds will die unless the culprit is found and convicted quickly. There are no leads but as police chief/attorney/governor you can frame an innocent man to charge and convict quickly. Both sum and average utilitarianism suggest you should.
    
    Same goes for pushing fat people in front of runaway trolleys and carving up homeless people for their organs.
    
    Utilitarianism means biting all these bullets or else accepting these as proofs by reductio.
    
    Edit: Or structuring/defining utilitarianism in a way that avoids these issues. But it is harder than it looks.
    - Paul Crowley 16 Jan 2010 11:23 UTC
      4 points
      Parent
      Or seeing the larger consequences of any of these courses of action.
      
      (Well, except for pushing the fat man in front of the trolley, which I largely favour.)
      - Jack 16 Jan 2010 18:09 UTC
        1 point
        Parent
        I’m comfortable positing things about these scenarios such that there are no larger consequences of these courses of action- no one finds out, no norms are set etc.
        
        I do suspect an unusually high number of people here will want to bite the bullet.(Interesting side effect of making philosophical thought experiments hilarious: it can be hard to tell if someone is kidding about them) But it seems well worth keeping in mind that the vast majority would find a world governed by the typical forms of utilitarianism to be highly immoral.
        Paul Crowley 16 Jan 2010 19:24 UTC
        5 points
        Parent
        These are not realistic scenarios as painted. In order to be able to actually imagine what really might be the right thing to do if a scenario fitting these very alien conditions arose, you’ll have to paint a lot more of the picture, and it might leave our intuitions about what was right in that scenario looking very different.
        Jack 18 Jan 2010 2:53 UTC
        2 points
        Parent
        They’re not realistic because they’re designed to isolate the relevant intuitions from the noise. Being suspicious of our intuitions about fictional scenarios is fine- but I don’t think that lets you get away without updating. These scenarios are easy to generate and have several features in common. I don’t expect anyone to give up their utilitarianism on the basis of the above comment—but a little more skepticism would be good.
        Paul Crowley 18 Jan 2010 8:30 UTC
        4 points
        Parent
        I’m happy to accept whatever trolley problem you care to suggest. Those are artificial but there’s no conceptual problem with setting them up in today’s world—you just put the actors and rails and levers in the right places and you’re set. But to set up a situation where hundreds will die in this possible riot, and yet it it certain that no-one will find out and no norms will be set if you frame the guy—that’s just no longer a problem set in a world anything like our world, and I’d need to know a lot more about this weird proposed world before I was prepared to say what the right thing to do in it might be.
    - magfrump 17 Jan 2010 5:00 UTC
      0 points
      Parent
      To the extent that I have been exposed to these types of situations, it seems that the contradictions stem from contrived circumstances. I’ve also never had a simple and consistent deontological system lined out for me that didn’t suffer the same flaws.
      
      So I guess what I’m really getting at is that I see utilitarianism as a good heuristic for matching up circumstances with judgments that “feel right” and I’m curious if/why OP thinks the heuristic is bad.
      - Jack 17 Jan 2010 18:11 UTC
        0 points
        Parent
        
        To the extent that I have been exposed to these types of situations, it seems that the contradictions stem from contrived circumstances.
        
        Not sure what this means.
        
        I’ve also never had a simple and consistent deontological system lined out for me that didn’t suffer the same flaws.
        
        Nor have I. My guess is that simple and consistent is too much to ask of any moral theory.
        
        So I guess what I’m really getting at is that I see utilitarianism as a good heuristic for matching up circumstances with judgments that “feel right” and I’m curious if/why OP thinks the heuristic is bad.
        
        It is definitely a nice heuristic. I don’t know what OP thinks but a lot of people here take it to be the answer, instead of just a heuristic. That may be the target of the objection.
        magfrump 18 Jan 2010 19:15 UTC
        0 points
        Parent
        “Exposed to these situations” means to say that when someone asks about utilitarianism they say, “if there was a fat man in front of a train filled with single parents and you could push him out of the way or let the train run off a cliff what would you do?” To which my reply is, “When does that ever happen and how does answering that question help me be more ethical?”
        
        Digression: if a decision-theoretic model was translated into a set of axiomatic behaviors could you potentially apply Godel’s Incompleteness Theorem to prove that simple and consistent is in fact too much to ask?
        orthonormal 18 Jan 2010 20:18 UTC
        10 points
        Parent
        Please don’t throw around Gödel’s Theorem before you’ve really understood it— that’s one thing that makes people look like cranks!
        
        “When does that ever happen and how does answering that question help me be more ethical?”
        
        Very rarely; but pondering such hypotheticals has helped me to see what some of my actual moral intuitions are, once they are stripped of rationalizations (and chances to dodge the question). From that point on, I can reflect on them more effectively.
        magfrump 19 Jan 2010 16:33 UTC
        1 point
        Parent
        Sorry to sound crankish. Rather than “simple and inconsistent” I might have said that there were contrived and thus unanswerable questions. Regardless it distracted and I shouldn’t have digressed at all.
        
        Anyway thank you for the good answer concerning hypotheticals.
        Jack 19 Jan 2010 2:38 UTC
        8 points
        Parent
        
        “Exposed to these situations” means to say that when someone asks about utilitarianism they say, “if there was a fat man in front of a train filled with single parents and you could push him out of the way or let the train run off a cliff what would you do?” To which my reply is, “When does that ever happen and how does answering that question help me be more ethical?”
        
        These thought experiments aren’t supposed to make you more ethical, they’re supposed to help us understand our morality. If you think there are regularities in ethics- general rules that apply to multiple situations then it helps to concoct scenarios to see how those rules function. Often they’re contrived because they are experiments, set up to see how the introduction of a moral principle affects our intuitions. In natural science experimental conditions usually have to be concocted as well. You don’t usually find two population groups for whom everything is the same except for one variable, for example.
        
        Digression: if a decision-theoretic model was translated into a set of axiomatic behaviors could you potentially apply Godel’s Incompleteness Theorem to prove that simple and consistent is in fact too much to ask?
        
        Agree with orthonormal. Not sure what this would mean. I don’t think Godel even does that for arithmetic—arithmetic is simple (though not trivial) and consistent, it just isn’t complete. I have no idea if ethics could be a complete axiomatic system, I haven’t done much on completeness beyond predicate calculus and Godel is still a little over my head.
        
        I just mean that any simple set of principles will have to be applied inconsistently to match our intuitions. This, on moral particularism, is relevant.
        magfrump 19 Jan 2010 16:37 UTC
        0 points
        Parent
        I didn’t use “consistence” very rigorously here, I more meant that even if a principle matched our intuitions there would be unanswerable questions.
        
        Regardless, good answer. The link seems to be broken for me, though.
        Jack 19 Jan 2010 18:19 UTC
        0 points
        Parent
        Link is working fine for me. It is also the first google result for “moral particularism”, so you can get there that way.
        magfrump 20 Jan 2010 1:25 UTC
        0 points
        Parent
        Tried that and it gave me the same broken site. It works now.
        Nick_Tarleton 19 Jan 2010 3:01 UTC
        0 points
        Parent
        Why on Earth was this downvoted?
- Nick_Tarleton 15 Jan 2010 3:33 UTC
  0 points
  Parent
  By “utilitarianism” do you mean any system maximizing expected utility over outcomes, or the subset of such systems that sum/average across persons?
  - mattnewport 15 Jan 2010 5:48 UTC
    3 points
    Parent
    The latter, I don’t think it makes much sense to call the former an ethical system, it’s just a description of how to make optimal decisions.
    - timtyler 15 Jan 2010 18:30 UTC
      0 points
      Parent
      This post does have “preference utilitarianism” in its title.
      
      http://en.wikipedia.org/wiki/Preference_utilitarianism
      - mattnewport 15 Jan 2010 19:01 UTC
        2 points
        Parent
        As far as I can tell from the minimal information in that link, preference utilitarianism still involves summing/averaging/weighting utility across all persons. The ‘preference’ part of ‘preference utilitarianism’ refers to the fact that it is people’s ‘preferences’ that determine their individual utility but the ‘utilitarianism’ part still implies summing/averaging/weighting across persons. The link mentions Peter Singer as the leading contemporary advocate of preference utilitarianism and as I understand it he is still a utilitarian in that sense.
        
        ‘Maximizing expected utility over outcomes’ is just a description of how to make optimal decisions given a utility function. It is agnostic about what that utility function should be. Utilitarianism as a moral/ethical philosophy generally seems to advocate a choice of utility function that uses a unique weighting across all individuals as the definition of what is morally/ethically ‘right’.
        timtyler 15 Jan 2010 22:41 UTC
        1 point
        Parent
        You could be right. I can’t see mention of “averaging” or “summing” in the definitions (which! it matters!) - and if any sum is to be performed it is vague about what class of entities is being summed over. However—as you say—Singer is a “sum” enthusiast. How you can measure “satisfaction” in a way that can be added up over multiple people is left as a mystery for readers.
        
        I wouldn’t assert the second paragraph, though. Satisfying preferences is still a moral philosophy—regardless of whether those preferences belong to an individual agent, or whether preference satisfaction is summed over a group.
        
        Both concepts equally allow for agents with arbitrary preferences.
        mattnewport 15 Jan 2010 23:08 UTC
        0 points
        Parent
        The main Wikipedia entry for Utilitarianism says:
        
        Utilitarianism is the idea that the moral worth of an action is determined solely by its utility in providing happiness or pleasure as summed among all people. It is thus a form of consequentialism, meaning that the moral worth of an action is determined by its outcome.
        
        Utilitarianism is often described by the phrase “the greatest good for the greatest number of people”, and is also known as “the greatest happiness principle”. Utility, the good to be maximized, has been defined by various thinkers as happiness or pleasure (versus suffering or pain), although preference utilitarians define it as the satisfaction of preferences.
        
        Where ‘preference utilitarians’ links back to the short page on preference utilitarianism you referenced. That combined with the description of Peter Singer as the most prominent advocate for preference utilitarianism suggests weighted summing or averaging, though I’m not clear whether there is some specific procedure associated with ‘preference utilitarianism’.
        
        Merely satisfying your own preferences is a moral philosophy but it’s not utilitarianism. Ethical Egoism maybe or just hedonism. What appears to distinguish utilitarian ethics is that they propose a unique utility function that globally defines what is moral/ethical for all agents.
  - timtyler 15 Jan 2010 18:27 UTC
    0 points
    Parent
    It seems like a historical tragedy that a perfectly sensible word was ever given the second esoteric meaning.
LauraABJ 15 Jan 2010 16:18 UTC
7 points
There’s a far worse problem with the concept of ‘utility function’ as a static entity than that different generations have different preferences: The same person has very different preferences depending on his environment and neurochemistry. A heroin addict really does prefer heroin to a normal life (at least during his addiction). An ex-junkie friend of mine wistfully recalls how amazing heroin felt and how he realized he was failing out of school and slowly wasting away to death, but none of that mattered as long as there was still junk. Now, it’s not hard to imagine how in a few itterations of ‘maximizing changing utilities’ we all end up wire-headed one way or another. I see no easy solution to this problem. If we say “The utility function is that of unaltered, non-digital humans, living today,” then there will be no room for growth and change after the singularity. However, I don’t see an easy way of not falling into the local maximum of wire-heading one way or another at some point… Solutions welcome.
- Blueberry 15 Jan 2010 18:38 UTC
  1 point
  Parent
  What’s wrong with wireheading? Seriously. Heroin is harmful for numerous health and societal reasons, but if we solve those problems with wireheading, I don’t see the problem with large portions of humanity choosing ultimate pleasure forever.
  
  We could also make some workarounds: for instance, timed wireheading, where you wirehead for a year and then set your brain to disable wireheading for another year, or a more sophisticated Fun Theory based version of wireheading that allows for slightly more complex pleasures.
  - ChristianKl 16 Jan 2010 16:26 UTC
    2 points
    Parent
    There a difference between people choosing wireheading and a clever AI making that choice for them.
- mattnewport 15 Jan 2010 17:07 UTC
  1 point
  Parent
  Why did your ex-junkie friend quit? That may suggest a possible answer to your dilemma.
  - LauraABJ 15 Jan 2010 17:22 UTC
    1 point
    Parent
    Combination of being broke, almost dying, mother-interference, naltrexone, and being institutionalized. I think there are many that do not quit though.
    - mattnewport 15 Jan 2010 17:26 UTC
      2 points
      Parent
      There are people who die from their drug habits but there are also many recovered former addicts. There are also people who sustain a drug habit without the rest of their life collapsing completely, even a heroin habit. It is clearly possible for people to make choices other than just taking another hit.
      - LauraABJ 15 Jan 2010 17:48 UTC
        3 points
        Parent
        This is obviously true, but I’m not suggesting that all people will become heroin junkies. I’m using heroin addiction as an example of where neurochemistry changes directly change preference and therefore utility function- IE the ‘utility function’ is not a static entity. Neurochemistry differences among people are vast, and heroin doesn’t come close to a true ‘wire-head,’ and yet some percent of normal people are susceptible to having it alter their preferences to the point of death. After uploading/AI, interventions far more invasive and complete than heroin will be possible, and perhaps widely available. It is nice to think that humans will opt not to use them, and most people with their current preferences in tact might not even try (as many have never tried heroin), but if preferences are constantly being changed (as we will be able to do), then it seems likely than people will eventually slide down a slippery slope towards wire-heading, since, well, it’s easy.
        mattnewport 15 Jan 2010 19:49 UTC
        1 point
        Parent
        I find the prospect of an AI changing people’s preferences to make them easier to satisfy rather disturbing. I’m not really worried about people changing their own preferences or succumbing en-masse to wireheading. It seems to me that if people could alter their own preferences then they would be much more inclined to move their preferences further away from a tendency towards wireheading. I see a lot more books on how to resist short term temptations (diet books, books on personal finance, etc.) than I do on how to make yourself satisfied with being fat or poor which suggests that generally people prefer preference changes that work in their longer term rather than short term interests.
jimmy 15 Jan 2010 4:56 UTC
7 points

The AI is now reflectively consistent, but is this the right outcome?

I’d say so.

I wan’t the AI to maximize my utility, and not dilute the optimization power with anyone else’s preferences (by definition). Of course, to the extent that I care about others they will get some weight under my utility function, but any more than that is not something I’d wan’t.

Anything else is just cooperation, which is great, since it greatly increases the chance of it working- and even more so the chance of it working for you. The group of all people the designers can easily trade with is the right group to do some average over.

The group of people alive at the time is the easiest group to trade with, but there are ways of trading with the dead and there has been talk about trading with other possible worlds
HalFinney 15 Jan 2010 4:14 UTC
6 points
I wouldn’t be so quick to discard the idea of the AI persuading us that things are pretty nice the way they are. There are probably strong limits to the persuadability of human beings, so it wouldn’t be a disaster. And there is a long tradition of advice regarding the (claimed) wisdom of learning to enjoy life as you find it.
- Wei Dai 26 Jan 2010 5:34 UTC
  6 points
  Parent
  
  I wouldn’t be so quick to discard the idea of the AI persuading us that things are pretty nice the way they are.
  
  Suppose the AI we build (AI1) finds itself insufficiently intelligent to persuade us. It decides to build a more powerful AI (AI2) to give it advice. AI2 wakes up and modifies AI1 into being perfectly satisfied with the way things are. Then, mission accomplished, they both shut down and leave humanity unchanged.
  
  I think what went wrong here is that this formulation of utilitarianism isn’t reflectively consistent.
  
  There are probably strong limits to the persuadability of human beings, so it wouldn’t be a disaster.
  
  If there are, then the AI would modify us physically instead.
- magfrump 15 Jan 2010 17:57 UTC
  4 points
  Parent
  Why do you say these “strong limits” exist? What are they?
  
  I do think that everyone being persuaded to be Bodhisattvas is a pretty good possible future, but I do think there are better futures that might be given up by that path. (immortal cyborg-Bodhisattvas?)
- wedrifid 15 Jan 2010 6:20 UTC
  0 points
  Parent
  
  There are probably strong limits to the persuadability of human beings, so it wouldn’t be a disaster.
  
  Strong limits? You mean the limit of how much the atoms in a human can be rearranged and still be called ‘human’?
RobinZ 15 Jan 2010 1:26 UTC
6 points
Obviously, weighing equally over every logically possible utility function will produce a null result—for every utility function, a corresponding utility function with the opposite preferences will exist.
- Wei Dai 15 Jan 2010 1:35 UTC
  3 points
  Parent
  I agree, of course. The question was a rhetorical one to point out the incomplete nature of Robin’s solution.
  - RobinZ 15 Jan 2010 1:37 UTC
    4 points
    Parent
    Curse the lack of verbal cues on the Interwebs!
- RobinHanson 15 Jan 2010 15:02 UTC
  1 point
  Parent
  That doesn’t make it wrong, it makes it impotent. To break this “tie”, you’d end up preferring to create creatures that existing creatures would prefer exist, and then preferring to satisfy their preferences. Which makes sense to me.
  - RobinZ 15 Jan 2010 15:10 UTC
    0 points
    Parent
    I don’t understand your objection to my remark—I was analyzing the system Wei_Dai described, which evidently differs from yours.
Mitchell_Porter 15 Jan 2010 10:25 UTC
4 points

If the AI is capable of reflection and self-modification, it should immediately notice that it would maximize its expected utility, according to its current utility function, by modifying itself to use U’’(w) = sum of w’s utilities according to people who existed at time T0, where T0 is a constant representing the time of self-modification.

To do this it would have to be badly programmed. We start out with a time-dependent utility function U’(t). We propose to change it to U″, where U″(t) = U’(0) for all times t. But those are different functions! The utility over time of a particular future will be different for U’ and U″, and so will be the expected utility of a given action.

The expression “current utility function” is ambiguous when the utility function is time-dependent.
- FrankAdamek 15 Jan 2010 14:53 UTC
  1 point
  Parent
  I agree with the above comments that concern for future individuals would be contained in the utility functions of people who exist now, but there’s an ambiguity in the AI’s utility function in that it seems forbidden to consider the future or past output of it’s utility function. By limiting itself to the concern of the people who currently exist, if it were to try and maximize this output over all time it would then be concerning itself with people who do not yet or no longer exist, which is at direct odds with its utility function. Being barred from such considerations, it could make sense to change it’s own utility function to restrict concern to the people existing at that tame, IF this is what most satisfied the preference of those people.
  
  While the default near-sightedness of people is bad news here, if the AI succeeds in modelling us as “smarter, more the people we want to be” etc, then its utility function seems unlikely to become so fixed in time.
Kutta 15 Jan 2010 13:45 UTC
2 points

creating a bunch of new people whose preferences are more easily satisfied, or just use its super intelligence to persuade us to be more satisfied with the universe as it is.

Should the whole future of the universe be shaped only by the preferences of those who happen to be alive at some arbitrary point in time?

Well, making people’s preferences coincide with the universe by adjusting people’s preferences is not possible if people prefer their preferences not to be adjusted to the universe. Or possible only to the extent people currently prefer being changed.

Changing people or caring about future humans or other entities is basically a second guess about what current people care about. You do not need to manually add external factors to the utility function on the basis that you worry that these things “might be left out” of it. Anything that should be considered is already in the current CEV; people already care deeply about their future selves and future people, and care about some other non-human beings, such as animals.

Adding anything else to the equation seems to me just as arbitrary as picking the utility function of a random paperclip AI and trying to maximize it.
Stuart_Armstrong 15 Jan 2010 10:13 UTC
2 points
I believe you can strip the AI of any preferences towards human utility functions with a simple hack.

Every decision of the AI will have two effects on expected human utility: it will change it, and it will change the human utility functions.

Have the AI make its decisions only based on the effect on the current expected human utility, not on the changes to the function. Add a term granting a large disutility for deaths, and this should do the trick.

Note the importance of the “current” expected utility in this setup; an AI will decide whether to industrialise a primitive tribe based on their current utility; if it does industrialise them, it will base its subsequent decisions on their new, industrialised utility.
- arbimote 16 Jan 2010 5:11 UTC
  4 points
  Parent
  
  Add a term granting a large disutility for deaths, and this should do the trick.
  
  What if death isn’t well-defined? What if the AI has the option of cryonically freezing a person to save their life—but then being frozen, that person does not have any “current” utility function, so the AI can then disregard them completely. Situations like this also demonstrate that more generally, trying to satisfy someone’s utility function may have an unavoidable side-effect of changing their utility function. These side-effects may be complex enough that the person does not forsee them, and it is not possible for the AI to explain them to the person.
  
  I think your “simple hack” is not actually that simple or well-defined.
  - Stuart_Armstrong 18 Jan 2010 13:17 UTC
    0 points
    Parent
    It’s simple, it’s well defined—it just doesn’t work. Or at least, work naively the way I was hoping.
    
    The original version of the hack—on one-shot oracle machines—worked reasonably well. This version needs more work. And I shouldn’t have mentioned deaths here; that whole subject requires its own seperate treatment.
- JustinShovelain 15 Jan 2010 17:23 UTC
  3 points
  Parent
  What keeps the AI from immediately changing itself to only care about the people’s current utility function? That’s a change with very high expected utility defined in terms of their current utility function and one with little tendency to change their current utility function.
  
  Will you believe that a simple hack will work with lower confidence next time?
  - Stuart_Armstrong 18 Jan 2010 13:18 UTC
    0 points
    Parent
    
    Will you believe that a simple hack will work with lower confidence next time?
    
    Slightly. I was counting on this one getting bashed into shape by the comments; it wasn’t so in future, I’ll try and do more of the bashing myself.
- timtyler 15 Jan 2010 18:34 UTC
  0 points
  Parent
  You meant “any preferences towards MODIFYING human utility functions”.
  - Stuart_Armstrong 18 Jan 2010 12:25 UTC
    0 points
    Parent
    Yep
Nanani 15 Jan 2010 7:25 UTC
2 points
Related question: What is the purpose of taking into consideration the preferences of people NOT around to deal with the AI?

The dead and the potential-future-people, not to mention the people of other possible worlds, haven’t got any say in anything that happens now in this world. This is because it is physically impossible for us (people in the present of this possible world) to find out what those preferences are. At best, we can only guess and extrapolate.

Unless the AI has the ability to find out those preferences, it ought to weigh currently our preferences more heavily because of that additional certainty.
- RobinHanson 15 Jan 2010 15:12 UTC
  1 point
  Parent
  Why take into account the preferences of anyone other than the builders of the AI, other than via the fact that those builders may care about those other creatures?
XiXiDu 15 Jan 2010 19:50 UTC
1 point
So are we going to take into account the preferences of the AI itself? Or are we going to violate its rights by creating its preferences based on our current liking? What about the other AI’s and their preferences? Obviously this is a paradox which arises by considering to please imaginary entities.
What links here?
- XiXiDu's comment on Should humanity give birth to a galactic civilization? by XiXiDu (18 Aug 2010 8:57 UTC; 2 points)
RobinHanson 15 Jan 2010 15:10 UTC
1 point
My version of utilitarianism is “dealism”, and the way I’d suggest thinking about this is in terms of the scope of the implicit “deal” you are implementing. At one extreme you as dictator just enforce your temporary personal preferences over everything, while at the other extreme you weigh the preferences of all creatures who ever have existed or ever could exist. Doing anything but the later may be a slippery slope. First you’ll decide to ignore possible creatures, then future creatures, then animals, then maybe people with low IQ, people who don’t respect Western values, and eventually it will just be the values of you and your friends on the project. What other principle can you use to draw this line between creatures who count and those who don’t?
- Wei Dai 15 Jan 2010 19:20 UTC
  7 points
  Parent
  Robin, I don’t understand why you refer to it as “dealism”. The word “deal” makes it sound as if your moral philosophy is more about cooperation than altruism, but in that case why would you give any weight to the preferences of animals and people with low IQ (for example), since they have little to offer you in return?
  - RobinHanson 15 Jan 2010 20:08 UTC
    4 points
    Parent
    Deals can be lopsided. If they have little to offer, they may get little in return.
    - mattnewport 15 Jan 2010 21:24 UTC
      4 points
      Parent
      This seems to provide an answer to the question you posed above.
      
      What other principle can you use to draw this line between creatures who count and those who don’t?
      
      Chickens have very little to offer me other than their tasty flesh and essentially no capacity to meaningfully threaten me which is why I don’t take their preferences into account. If you’re happy with lopsided deals then there’s how you draw the line.
      
      This seems like a perfectly reasonable position to take but it doesn’t sound anything like utilitarianism to me.
      - RobinHanson 15 Jan 2010 22:32 UTC
        0 points
        Parent
        Turns out, the best deals look a lot like maximizing weighted averages of the utilities of affected parties.
        mattnewport 15 Jan 2010 22:58 UTC
        7 points
        Parent
        Well the weighting is really the crux of the issue. If you are proposing that weighting should reflect both what the affected parties can offer and what they can credibly threaten then I still don’t think this sounds much like utilitarianism as usually defined. It sounds more like realpolitik / might-is-right.
        Wei Dai 15 Jan 2010 23:55 UTC
        4 points
        Parent
        
        Turns out, the best deals look a lot like maximizing weighted averages of the utilities of affected parties.
        
        I disagree. Certainly there are examples where the best deals do not look like maximizing weighted averages of the utilities of affected parties, and I gave one here. Are you aware of some argument that these kinds of situations are not likely in real life?
        
        I also agree with mattnewport’s point, BTW.
    - Wei Dai 15 Jan 2010 20:54 UTC
      1 point
      Parent
      Ok, I didn’t realize that you would weigh others’ preferences by how much they can offer you. My followup question is, you seem willing to give weight to other people’s preferences unilaterally, without requiring that they do the same for you, which is again more like altruism than cooperation. (For example you don’t want to ignore animals, but they can’t really reciprocate your attempt at cooperation.) Is that also a misunderstanding on my part?
      - RobinHanson 15 Jan 2010 21:21 UTC
        2 points
        Parent
        Creatures get weight in a deal both because they have things to offer, and because others who have things to offer care about them.
        denisbider 25 Jan 2010 20:40 UTC
        0 points
        Parent
        But post-FAI, how does anyone except the FAI have anything to offer? Neither anything to offer, nor anything to threaten with. The FAI decides all, does all, rules all. The question is, how should it rule? Since no creature besides the FAI has anything to offer, weighting is out of the equation, and every present, past, and potential creature’s utilities should count the same.
        Wei Dai 25 Jan 2010 20:56 UTC
        1 point
        Parent
        I think an FAI’s values would reflect the programmers’ values (unless it turns out there is Objective Morality or something else unexpected). My understanding now is that if Robin were the FAI’s programmer, the weights he would give to other people in its utility function would depend on how much they helped him create the FAI (and for people who didn’t help, how much the helpers care about them).
        denisbider 25 Jan 2010 21:04 UTC
        1 point
        Parent
        Sounds plenty selfish to me. Indeed, no different than might-is-right.
        Wei Dai 27 Jan 2010 1:11 UTC
        4 points
        Parent
        
        Sounds plenty selfish to me. Indeed, no different than might-is-right.
        
        Instead of might-is-right, I’d summarize it as “might-and-the-ability-to-provide-services-to-others-in-exchange-for-what-you-want-is-right” and Robin would presumably emphasize the second part of that.
        Vladimir_Nesov 26 Jan 2010 22:17 UTC
        3 points
        Parent
        You can care a lot about other people no matter how much they help you, but should help those who helps you even more for game-theoretic reasons. This doesn’t at all imply “selfishness”.
- XiXiDu 15 Jan 2010 20:09 UTC
  1 point
  Parent
  There does exist no goal that is of objective moral superiority. Trying to maximize happiness for everybody is just the selfish effort to survive, given that not you but somebody else wins. So we’re trying to survive by making everybody wanting to make everybody else happy? What if the largest number of possible creatures is too different from us to peacefully, or happily, coexist with us?
- timtyler 15 Jan 2010 18:53 UTC
  −3 points
  Parent
  For one example, see the maximum entropy principle.
  
  http://en.citizendium.org/wiki/Life/Signed_Articles/John_Whitfield
  
  My page on the topic:
  
  http://originoflife.net/gods_utility_function/
MugaSofer 8 Jan 2013 9:47 UTC
0 points
If you believe that human morality is isomorphic to preference utilitarianism—a claim that I do not endorse, but which is not trivially false—then using preferences from a particular point in time should work fine, assuming those preferences belong to humans. (Presumably humans would not value the creation of minds with other utility functions if this would obligate us to, well, value their preferences.)
John_Maxwell 15 Jan 2010 7:02 UTC
0 points

use its super intelligence to persuade us to be more satisfied with the universe as it is.

Actually, I would consider this outcome pretty satisfactory. My life is (presumably) unimaginably good compared to that of a peasant from the 1400s but I’m only occasionally ecstatic with happiness. It’s not clear to me that a radical upgrade in my standard of living would change this...
- Nick_Tarleton 15 Jan 2010 16:26 UTC
  4 points
  Parent
  Preferences and emotions are entirely distinct (in principle). The original post is talking about changing preferences (though “satisfied” does sound more like it’s about emotions), you’re talking about changing emotions. I think I’d go for a happiness upgrade as well, but (almost by definition) I don’t want my ‘real’ preferences (waves hands furiously) to change.
- AngryParsley 15 Jan 2010 9:03 UTC
  3 points
  Parent
  You don’t mind if your preferences or beliefs change to make you happier with the current state of the universe? Then you’re in luck!
denisbider 25 Jan 2010 20:48 UTC
−1 points

The AI might [...] just use its super intelligence to persuade us to be more satisfied with the universe as it is.

Well, that can’t be what we want.

Actually, I believe Buddhism says that this is exactly what we want.
James_K 15 Jan 2010 5:54 UTC
−1 points
As far as I can tell, all this post says is that utilitarianism is entirely dependant on a given set of preferences, and its outcomes will only be optimal from the perspective of those preferences.

This is true, but I’m not sure its all that interesting.
JRMayne 15 Jan 2010 1:20 UTC
−1 points
I’m convinced of utilitarianism as the proper moral construct, but I don’t think an AI should use a free-ranging utilitarianism, because it’s just too dangerous. A relatively small calculation error, or a somewhat eccentric view of the future can lead to very bad outcomes indeed.

A really smart, powerful AI, it seems to me, should be constrained by rules of behavior (no wiping out humanity/no turning every channel into 24-7 porn/no putting everyone to work in the paperclip factory), The assumption that something very smart would necessarily reach correct utiltarian views seems facially false; it could assume that humans must think like it does, or assume that dogs generate more utility with less effort due to their easier ability to be happy, or decide that humans need more superintelligent machines in a great big hurry and should build them regardless of anything else.

And maybe it’d be right here or there. But maybe not. I think almost definitionally that FAI cannot be full-on, free-range utilitarian of any stripe. Am I wrong?
- orthonormal 17 Jan 2010 0:17 UTC
  4 points
  Parent
  The ideas under consideration aren’t as simple as having the AI act by pleasure utlitarianism or preference utilitarianism, because we actually care about a whole lot of things in our evaluation of futures. Many of the things that might horrify us are things we’ve rarely or never needed to be consciously aware of, because nobody currently has the power or the desire to enact them; but if we miss adding just one hidden rule, we could wind up in a horrible future.
  
  Thus “rule-following AI” has to get human nature just as right as “utilitarian AI” in order to reach a good outcome. For that reason, Eliezer et al. are looking for more meta ways of going about choosing a utility function. The reason why they prefer utilitarianism to rule-based AI is another still-disputed area on this site (I should point out that I agree with Eliezer here).
- JustinShovelain 15 Jan 2010 17:35 UTC
  2 points
  Parent
  Why are you more concerned about something with unlimited ability to self reflect making a calculation error than about the above being a calculation error? The AI could implement the above if the calculation implicit in it is correct.
Christian_Szegedy 15 Jan 2010 1:35 UTC
−3 points
I am not sure the exact semantics of the word “utilitarism” in your post, but IMO it would be better to use multi-dimensional objective function rather than simple numbers.

For example killing a single moral agent should outweigh convenience gain by any number of agents. (see dust speck vs. torture). That can be modeled by a two-dimensional objective, the first number represents the immorality of choice and the second is the total preference. The total order over the scoring would be a lexicographic order of the two components.

Another aspect is that the creation of new agents (e.i. determining their qualities, objectives, etc.) should never be the responsibility of the AI, but this task should be distributed among all existing moral agents.
- Nick_Tarleton 15 Jan 2010 3:40 UTC
  10 points
  Parent
  
  For example killing a single moral agent should outweigh convenience gain by any number of agents. (see dust speck vs. torture). That can be modeled by a two-dimensional objective, the first number represents the immorality of choice and the second is the total preference. The total order over the scoring would be a lexicographic order of the two components.
  
  If not killing has lexical priority, all other concerns will be entirely overridden by tiny differences in the probability of killing, in any non-toy case.
  
  Anyway, our preferences seem more directly not to give life lexical priority. We’re willing to drive to the store for convenience, and endorse others doing so, even though driving imposes a nontrivial risk of death on oneself and others.
  - Christian_Szegedy 15 Jan 2010 7:39 UTC
    0 points
    Parent
    One can’t really equate risking a life with outright killing.
    
    If we want any system that is aligned with human morality we just can’t make decision based on the desirability of the outcome. For example: “Is it right to kill a healthy person to give its organs to five terminally ill patients and therefore save five lives at a cost of one.” Our sense says killing an innocent bystander as immoral, even if it saves more lives. (See http://www.justiceharvard.org/)
    
    It is possible to move away from human morality, but the end result will be that most humans will perceive the decisions of the AI monstrous at least in the beginning… ;)
    - SilasBarta 15 Jan 2010 19:52 UTC
      6 points
      Parent
      
      For example: “Is it right to kill a healthy person to give its organs to five terminally ill patients and therefore save five lives at a cost of one.” Our sense says killing an innocent bystander as immoral, even if it saves more lives.
      
      You really don’t even have to go that far in your justification, if you’re clever. You could just note that the actual result of such a practice is to make people go to greater efforts to avoid being in a position whereby they’ll be selected for murder/organ harvesting, resulting in an aggregate waste of resources on such risk avoidance that is bad even from a utilitarian standpoint.
      
      It’s much harder to find scenarios where such an action is justified on utilitarian grounds than you might think.
      - Furcas 15 Jan 2010 19:57 UTC
        0 points
        Parent
        That’s only true if this ‘practice’ is made into law, or something. What if it’s just your own personal moral conviction? Would you kill a healthy person to save five others if you thought you could get away with it?
        SilasBarta 15 Jan 2010 20:32 UTC
        4 points
        Parent
        
        That’s only true if this ‘practice’ is made into law, or something.
        
        Not at all. If it were revealed that a doctor had deliberately killed a patient to harvest the organs, it’s not like people will say, “Oh, well, I guess the law doesn’t make all doctors do this, so I shouldn’t change my behavior in response.” Most likely, they would want to know how common this is, and if there are any tell-tale signs that a doctor will act this way, and avoid being in a situation where they’ll be harvested.
        
        You have to account for these behavioral adjustments in any honest utilitarian calculus.
        
        Likewise, the Catholic Church worries about the consequence of one priest breaking confidence of a confessioner, even if they don’t make it a policy to do so afterward.
        
        What if it’s just your own personal moral conviction? Would you kill a healthy person to save five others if you thought you could get away with it?
        
        Unless I were under duress, no, but I can’t imagine a situation how I’d be in the position to make such a decision without being under duress!
        
        And again, I have to factor in the above calculation: if it’s not a one time thing, I have to account for the information that I’m doing this “leaking out”, and the fact that my very perceptions will be biased to artificially make this more noble than it really is.
        
        Btw, I was recently in an argument with Gene Callahan on his blog about how Peter Singer handles these issues (Singer targets the situation you’ve described), but I think he deleted those posts.
    - wedrifid 15 Jan 2010 8:23 UTC
      6 points
      Parent
      
      One can’t really equate risking a life with outright killing.
      
      That’s what I told the judge when loaded one bullet into my revolver and went on a ‘Russian killing spree’. He wasn’t impressed.
      - timtyler 15 Jan 2010 18:36 UTC
        2 points
        Parent
        If you didn’t kill anyone, what were you convicted of—and what sentence did you get?
        RobinZ 15 Jan 2010 18:44 UTC
        3 points
        Parent
        Edit: Blueberry’s interpretation may be more accurate.
        
        The sentence would be reckless endangerment in that case, possibly multiple counts; search engines suggest this is a gross misdemeanor in Washington State, which would make a typical maximum sentence of about a year. (Were I the judge, I would schedule the year for each count to be served successively, but that’s me.)
        Blueberry 15 Jan 2010 19:00 UTC
        4 points
        Parent
        In Washington, that’s at least attempted manslaughter, which leads to a 10 year maximum. It may even be attempted murder, though we’d need to check the case law.
        wedrifid 16 Jan 2010 14:23 UTC
        4 points
        Parent
        This is Australia. He started with possession of an unlicensed firearm and worked up from there.
        
        The worst part was the appeal. I showed them the security footage in which I clearly reseeded the revolver between each of my four shots rather then firing four chambers sequentially and they wouldn’t reduce the sentence by 22%.
        
        If one of my shots had gone off on the second shot we could have seen if the judge was a frequentist. Would he call in a psychologist as an expert witness? “Was the defendant planning to shoot twice or shoot up to four times until the gun fired?”
        RobinZ 15 Jan 2010 19:08 UTC
        0 points
        Parent
        Correction: a Class A felony has a maximum sentence of life in prison, according to your link. Otherwise, yeah, you’re right.
        Blueberry 15 Jan 2010 18:42 UTC
        0 points
        Parent
        That would be attempted murder, with a sentence of usually at least 20 years.
        What links here?
        RobinZ's comment on The Preference Utilitarian’s Time Inconsistency Problem by Wei Dai (15 Jan 2010 18:44 UTC; 3 points)
      - Christian_Szegedy 15 Jan 2010 8:42 UTC
        0 points
        Parent
        There is a huge difference between choosing a random person to kill and endangering someone.
        
        Our society already expects that there are risks to life that are not killing: for example airlines can make analysis about how much certain security procedures cost and how much lives do they save. If they can show that if it costs more than (I guess) 7 million dollars to save one life, then it is not reasonable to implement that measure.
    - Nick_Tarleton 15 Jan 2010 16:16 UTC
      4 points
      Parent
      
      One can’t really equate risking a life with outright killing.
      
      Even if you can cleanly distinguish them for a human, what’s the difference from the perspective of an effectively omniscient and omnipotent agent? (Whether or not an actual AGI would be such, a proposed morality should work in that case.)
      
      If we want any system that is aligned with human morality we just can’t make decision based on the desirability of the outcome. For example: “Is it right to kill a healthy person to give its organs to five terminally ill patients and therefore save five lives at a cost of one.” Our sense says killing an innocent bystander as immoral, even if it saves more lives. (See http://www.justiceharvard.org/)
      
      Er, doesn’t that just mean human morality assigns low desirability to the outcome innocent bystander killed to use organs? (That is, if that actually is a pure terminal value—it seems to me that this intuition reflects a correct instrumental judgment based on things like harms to public trust, not a terminal judgment about the badness of a death increasing in proportion to the benefit ensuing from that death or something.)
      
      If we want a system to be well-defined, reflectively consistent, and stable under omniscience and omnipotence, expected-utility consequentialism looks like the way to go. Fortunately, it’s pretty flexible.
      - Christian_Szegedy 15 Jan 2010 22:04 UTC
        1 point
        Parent
        
        Even if you can cleanly distinguish them for a human, what’s the difference from the perspective of an effectively omniscient and omnipotent agent? (Whether or not an actual AGI would be such, a proposed morality should work in that case.)
        
        To me, “omniscience” and “omnipotence” seem to be self-contradictory notions. Therefore, I consider it a waste of time to think about beings with such attributes.
        
        reflects a correct instrumental judgment based on things like harms to public trust, not a terminal judgment about the badness of a death increasing in proportion to the benefit ensuing from that death or something.
        
        OK. Do you think that if someone (e.g. an AI) kills random people for positive overall effect but manages to convince the public that they were random accidents (and therefore public trust is maintained), then it is a morally acceptable option?
      - Christian_Szegedy 15 Jan 2010 19:23 UTC
        0 points
        Parent
        
        Er, doesn’t that just mean human morality assigns low desirability to the outcome innocent bystander killed to use organs?
        
        That’s why I put “I am unsure how you define utilitarism”. If you just evaluate the outcome, then you see f(1 dead)+f(5 alive). If you evaluate the whole process, you see “f(1 guy killed as an innocent bystander) + f(5 alive)”, which may have a much lower desirability due to morality impact.
        
        The same consideration applies to the OP: If you only evaluate the final outcome: you may think that killing hard to satisfy people is a good thing. However if you add the morality penalty of killing innocent people, then the equation suddenly changes.
        
        The question of 1/multi-dimensional objective remains: the extreme liberal moralism would say that it is not allowed to take one dollar from a person, even if it could pay for saving one life, or killing one innocent bystander is wrong even if it could save billion lifes. Just because our agents are autonomous entities and they have unalienable rights to life, property, freedom, that can’t be violated, even for the greater good.
        
        The above problems can only be solved if the moral agents voluntarily opt into a system that takes away a portion of their individual freedom for a greater good. However this system should not give arbitrary power to a single entity but every (immoral) violation of autonomy should happen for a well defined “higher” purpose.
        
        I don’t say that this is the definitive way to address morality abstractly in the presence of a superintelligent entity, these are just reiterations of some of the moral principles our liberal western democracy are built upon.