[Altruist Support] How to determine your utility function
Follows on from HELP! I want to do good.
What have I learned since last time? I’ve learned that people want to see an SIAI donation; I’ll do it as soon as PayPal will let me. I’ve learned that people want more “how” and maybe more “doing”; I’ll write a doing post soon, but I’ve got this and two other background posts to write first. I’ve learned that there’s a nonzero level of interest in my project. I’ve learned that there’s a diversity of opinions; it suggests if I’m wrong, then I’m at least wrong in an interesting way. I may have learned that signalling low status—to avoid intimidating outsiders—may be less of a good strategy than signalling that I know what I’m talking about. I’ve learned that I am prone to answering a question other than that which was asked.
Somewhere in the Less Wrong archives there is a deeply shocking, disturbing post. It’s called Post Your Utility Function.
It’s shocking because basically no-one had any idea. At the time I was still learning but I knew that having a utility function was important—that it was what made everything else make sense. But I didn’t know what mine was supposed to be. And neither, apparently, did anyone else.
Eliezer commented ‘in prescriptive terms, how do you “help” someone without a utility function?’. This post is an attempt to start to answer this question.
Firstly, what the utility function is and what it’s not. It belongs to the field of instrumental rationality, not epistemic rationality; it is not part of the territory. Don’t expect it to correspond to something physical.
Also, it’s not supposed to model your revealed preferences—that is, your current behavior. If it did then it would mean you were already perfectly rational. If you don’t feel that’s the case then you need to look beyond your revealed preferences, toward what you really want.
In other words, the wrong way to determine your utility function is to think about what decisions you have made, or feel that you would make, in different situations. In other words, there’s a chance, just a chance, that up until now you’ve been doing it completely wrong. You haven’t been getting what you wanted.
So in order to play the utility game, you need humility. You need to accept that you might not have been getting what you want, and that it might hurt. All those little subgoals, they might just have been getting you nowhere more quickly.
So only play if you want to.
The first thing is to understand the domain of the utility function. It’s defined over entire world histories. You consider everything that has happened, and will happen, in your life and in the rest of the world. And out of that pops a number. That’s the idea.
This complexity means that utility functions generally have to be defined somewhat vaguely. (Except if you’re trying to build an AI). The complexity will also allow you a lot of flexibility in deciding what you really value.
The second thing is to think about your preferences. Set up some thought experiments to decide whether you prefer this outcome or that outcome. Don’t think about what you’d actually do if put in a situation to decide between them; then you will worry about the social consequences of making the “unethical” decision. If you value things other than your own happiness, don’t ask which outcome you’d be happier in. Instead just ask, which outcome seems preferable?. Which would you consider good news, and which bad news?
You can start writing things down if you like. One of the big things you’ll need to think about is how much you value self versus everyone else. But this may matter less than you think, for reasons I’ll get into later.
The third thing is to think about preferences between uncertain outcomes. This is somewhat technical, and I’d advise a shut-up-and-multiply approach. (You can try and go against that if you like, but you have to be careful not to end up in weirdness such as getting different answers if you phrase something as one big decision or as a series of identical little decisions).
The fourth thing is to ask whether this preference system satisfies the von Neumann-Morgenstern axioms. If it’s at all sane, it probably will. (Again, this is somewhat technical).
The last thing is to ask yourself: if I prefer outcome A over outcome B, do I want to act in such a way that I bring about outcome A? (continue only if the answer here is “yes”).
That’s it—you now have a shiny new utility function. And I want to help you optimize it. (Though it can grow and develop and change along with yourself; I want this to be a speculative process, not one in which you suddenly commit to an immutable life goal).
You probably don’t feel that anything has changed. You’re probably feeling and behaving exactly the same as you did before. But this is something I’ll have to leave for a later post. Once you start really feeling that you want to maximize your utility then things will start to happen. You’ll have something to protect.
Oh, you wanted to know my utility function? It goes something like this:
It’s the sum of the things I value. Once a person is created, I value that person’s life; I also value their happiness, fun and freedom of choice. I assign negative value to that person’s disease, pain and sadness. I value concepts such as beauty and awesomeness. I assign a large bonus negative value to the extinction of humanity. I weigh the happiness of myself and those close to me more highly than that of strangers, and this asymmetry is more pronounced when my overall well-being becomes low.
Four points: It’s actually going to be a lot more complicated than that. I’m aware that it’s not quantitative and no terminology is defined. I’m prepared to change it if someone points out a glaring mistake or problem, or if I just feel like it for some reason. And people should not start criticizing my behavior for not adhering to this, at least not yet. (I have a lot of explaining still to do).
Build an FAI and get it to scan your brain.
I think that would just yield your revealed preference function. As I said, trying to optimize that is like a falling apple trying to optimize “falling”. It doesn’t describe what you want to do; it describes what you’re going to do next no matter what.
No, it wouldn’t. It would read the brain and resolve it into a utility function. If it resolves into a revealed preference function then the FAI is bugged. Because I told it to deduce a utility function.
If we accept that what someone ‘wants’ can be distinct from their behaviour, then “what do I want?” and “what will I do?” are two different questions (unless you’re perfectly rational). Presumably, a FAI scanning a brain could answer either question.
I have nothing against your post (in particular, I’m not swinging that downvote hammer that I’m so fond of), but I must complain that this:
is a crime against communication. It’s meaningless, and I’m not talking about a pointless definition game. “Beauty” means different things to different people, but it means something vaguely similar to most people. Importantly, when they hear “beauty”, almost everyone understands what domain is being considered, even if they disagree on particular judgments in that domain. But “awesomeness”? That’s basically saying that you value valuable things.
To put it another way, let’s say that I’m talking about a Magic card, maybe this one. If I say X, I’m communicating Y:
X: “This card is beautiful!” Y: I might be literally talking about the art, or I might be indirectly talking about how the “flavor” (name, art, italic text) interacts with the gameplay mechanics to produce something strongly fantasy-themed.
X: “This card is powerful!” Y: I’m almost certainly talking about the gameplay mechanics. (It’s unlikely that the art is emotionally powerful, or something. If this were gallery art, that meaning could be more likely.)
X: “This card is cheap!” Y: I’m probably talking about the gameplay cost of the card (being cheap makes it powerful). There are several costs, I might be talking about a specific one or all of them together—context would make it clear. I could also be talking about the real-world monetary cost—unlikely, but context would make it clear.
X: “This card is hilarious!” Y: The flavor text (italics) probably said something amusing (especially likely if it were a Goblin card). Possibly, the gameplay mechanic does something clever (more likely if this were the card that says “you win when you have exactly 1 hitpoint”). In fact, I just noticed that it’s both an Artifact Creature and a Human, which basically never happens, yet fits with the art—that’s pretty funny.
X: “This card is awesome!” Y: I have communicated nothing whatsoever, except for the fact that I like the card. Unless you know what I tend to like—powerful cards, funny cards, white-haired pretty robot girls—you’ll have no clue why I specifically value this card.
Summary: Whatever you’re trying to do, you’ll be more effective at it if you communicate clearly.
“This X is awesome” communicates, minimally, that X inspires awe. It’s a perfectly distinct and valid component of an aesthetic. I would own to its presence in my own (pseudo)utility function, although I’d probably say “Burkean sublimity” instead for the signalling value.
I think you’re technically correct, but especially on the internet, the term “awesome” has been used more and more loosely such that it nearly does have the super-general meaning that STL is talking about. To say that X is awesome is usually just a strong, emphatic way of saying that “I like X”.
I actually like this as an indication of where your sequence is going. It seems promising. Not shockingly unlike anything seen before—you agree enough with the philosophy you found here that you’re explaining your own position with Sequence linkage—but it has a freshness and coherence which says that at least we’re off to a good start.
The trouble with coming up with a utility function is that people seem to start by coming up with ones they think sound good, i.e. stated utility function as social signaling. Which is pretty much moral system as social signaling, i.e. saying “I work by this system, so you can predict me well enough to trust me to deal with me.”
This effect has to be understood and consciously worked past to get the actual answer to “What is my utility function even if no-one else can see what I’ve worked out?” The last is the big question in my life (and I suspect that of anyone, really) and I still only have fragments of an answer, mostly gained by attempting to observe my behaviour and feelings. This is a hard question.
In a community where instrumental rationality is high-status, there will be social pressure to behave according to your stated utility function. So you have to be careful to at least state a function that’s sufficiently compatible with your real one that people won’t notice the discrepancy. If they do, they will try to help you overcome your “akrasia”.
Note that I said “compatible”, not “similar”. So your real utility function could be “complete selfishness”. If complete selfishness requires support from a rationalist community, you may wish to signal cooperation by stating that your utility function is “complete altruism”.
The result of this is that you will find doing lots of world-improving stuff (to keep the support of the rationalist community) , and your true selfish utility function will ensure you have lots of fun, find lots of sexual opportunities, etc. while doing it.
I should have mentioned the “stated” vs. “private” distinction in the above post. I’ll write these ideas up in a future post, but I’ll probably need to explain a bit more of my view on signalling first.
Perhaps consider links to:
http://lesswrong.com/lw/116/the_domain_of_your_utility_function/
http://en.wikipedia.org/wiki/Revealed_preference
Done. Thank you.
I noticed there was no explicit reference to truth (as in accurate mappings of the territory, not as in truth with a capital T). Am I to assume that goes along with protecting life, stopping extinction, and possibly as a subset of other people’s pursuits of happiness?
Yes—I probably value keeping people well-informed, in addition to what I mentioned. If nothing else, this would stop me from assigning a very high utility to “everyone gets wireheaded”.
Keeping myself well-informed is more of a sub-goal than an end value. I can’t improve the territory unless my map is accurate.
Not quite. Examining in detail the actual reasons for your decisions made in the past or those you could’ve counterfactually made can show you which real-world factor or cognitive mechanism should’ve been different how, which would allow you to obtain a specific better judgment in that situation (or a more general class of situations). This is the way you can point out cognitive biases. It’s wrong to discard your existing intuitions, even if you know that they are no good in the greater scheme of things, because they may well be the best you’ve got, or the best raw material you’ve got for developing better ones.
Agreed—I was too dismissive of what can be learnt from past decision-making experiences.
I was just pointing out that “I’m always timid in social situations therefore I want to be timid in social situations” is invalid reasoning.
To a first approximation, and as a heuristic rule, it is valid. There are specific additional reasons to believe the conclusion invalid, and they have to do with things other than the way the initial faulty conclusion was generated. You believe the reasoning invalid because you know the conclusion to be invalid, but not the other way around.
I meant “want” as in “this is one of my life goals; I would not wish to self-modify to be any other way”
I would have a similar function, assuming that by “humanity” you mean beings with humane-ish values rather than just H. sapiens.
Yes—my function as stated becomes completely incoherent when applied to transhuman societies. If and when that becomes an issue I’ll have a lot of hard thinking to do.
This is a brave thing to state, considering that transhumanism is such a big focus of this site. Transhumanists will be much less interested in your utility function if you don’t even claim that it is future-proof. But personally, I sympathize, because I believe that this future-proofing is a very hard, maybe even impossible task. (See my comments at this thread.)
I like what you’re trying to do and I hope you succeed, but I have some nitpicks.
My utility function corresponds to specific structures in my brain.
I don’t think that it’s that easy to figure out a utility function. There have been many things in my life that I’ve thought about and decided weren’t that important, but then when I’ve actually tried them I discovered that I really really enjoy them. (Like, having a girlfriend, large numbers of friends, social hobbies, etc.)
You could say that that’s a failure of my epistemic rationality, but I suspect that signalling is going to dominate hypotheticals that I think about but have never actually experienced. I think I would need a lot of practice in the inside view to be able to actually judge if I want something without trying it first.
Do you mean there’s a particular structure in your brain sending out “good” and “bad” messages, and that’s what you’re trying to optimize? (i.e. that you would choose to wirehead).
Or do you mean in the sense that “utility functions have something to do with expressed and revealed preferences which necessarily correspond to structures in my brain?”
Could they be covered under the umbrella of “I value things which I would enjoy if I tried”?
More the latter.
Yes, but I don’t know how to optimize for those things (other than being generally adventurous/curious), since I don’t know if I would enjoy them.
This has prompted me to add a para to Utility function noting that the model doesn’t predict individual human behaviour very well. Needs more references, particularly to the utility and disutility calculators operating pretty much separately. (I remember pjeby linked a study on this, I have no idea what the URL is.) That this is a ridiculously exploitable cognitive bias does not make it untrue.
Needs an equal excursion into the normative reasons for paying attention to the idea. See also http://wiki.lesswrong.com/wiki/Prospect_theory
It’s a wiki, you know what to do :-) The exploitability needs links too.
(I’m going to be noting in general when something prompts me to fiddle with the wiki, to advertise the thing.)
If I must write on the wiki as well, I won’t have anything else done that needs doing. I think the article became less balanced as a result of your contribution, and this is a negative effect of your contribution that comes in a package with the positive effect of making it. You’ve improved things, but also made them worse in another sense. (Yes, I’m still procrastinating on the Welcoming Skills thing.)
I must point out that, as you have already noted, the wiki is all but dead. You noted this as if you considered it a bad thing.
If you’re going to have a wiki at all, then you must consider that there are ways that work and ways that don’t. There is lots and lots and lots of experience of others for you to work from in this regard.
(Article polish tends to work on a cycle of 1. polished article. 2. someone adds missing bit, article looks unpolished and lumpy. 3. someone else adds more to balance. 4. maybe a rewrite. 5. repeat. This is how it actually works on wikis that aren’t dead.)
Wikis work on incremental improvement from imperfection, sometimes quite awful imperfection. If you can’t stand imperfection, a wiki is going to be a fundamentally painful tool to use. A healthy wiki is a living thing with lumpy and smelly bits.
There is also a failure mode where a wiki becomes a garbage dump. This is particularly easy when there is little activity, and LW wiki has little activity. If we had several concerned editors, then less control would be better, but not currently where we have none (even I’m not an active editor, I just quality-check). There is no magic, something must work as a ratchet that keeps improving quality, and on popular wikis it’s the sufficient number of editors that don’t let quality drop on the scale of months, even if it drops locally, and low-quality or badly-balanced contributions give them raw material.
The only interesting prospect in the current situation is if reducing control incited more activity, but I don’t think my attitude is that off-putting, a relevant reason shaping the current situation.
If you want the activity to go up, you’re going to have to start with getting content by whatever means, even if you don’t think it’s up to scratch. The way to get it up to scratch is to fix the problems yourself and lead by example, rather than to try to strongly moderate contributions; the latter is how to douse a wiki.
When I’m saying all this stuff, I’m speaking from considerable experience watching all manner of wikis (public and intranet) run, fail to run, start or fail to start. I see no reason the LW wiki is special in these senses. You appear to me to be speaking from personal surmise; I’m speaking from experience in and observation of what generally works and what doesn’t.
Step one: get more contributions. They will be lumpy, and that’s fine. That’s how all wikis start. If you think they’re not good enough, your options are (1) fix them yourself and lead by example (2) kill the wiki. (2) is also achievable by hard enough wortk at dousing enthusiasm, particularly nascent enthusiasm.
Remember that wikis can’t possibly work—by which I mean they seem to jar human expectations—except they do. The trick is you have to keep things slightly more open than you can stand.
Is a highly imperfect and lumpy LW wiki better than no LW wiki? I think it is, and that it will improve with continued participation. You appear to think not.
I’m telling you that to get the good stuff you want, you do have to go through being nowhere near as good first. That’s how wikis work.
Yes, this would work, but the option is not available. I can’t do that, this is not sufficiently important for me. Working on decision theory is more important. (I believe you are currently not qualified to do that in the spirit of LW, and would drive the wiki in a wrong direction. It would become a non-LW wiki covering topics inspired by LW.) There are lots of people who are qualified, but they don’t work on the wiki.
I don’t believe we’ve yet pinpointed a single point of disagreement (that is, a single fact that we both understand but expect to have different properties). Advice you were giving had preconditions that aren’t met here.
If you put it like that, the only option available is to kill the wiki. I expect there are better third options.
I reject you appeal to mystical knowledge. I understand how wikis that work do so.
It’s probably worse than the current LW wiki, which is in turn better than no LW wiki. The only reason to allow the current LW wiki to become imperfect and lumpy would be expectation that this step implements a process that eventually makes it better than it is currently. You need to make that case for this wiki.
I tend to share your perspective. We definitely don’t want a mediocre wiki that doesn’t represent LW content. That would be worse than nothing. Moreover there are already other wikis around that can fill that niche. RationalWiki, perhaps?
While not ideal there it would not be a huge problem if the wiki was limited to a few pages like the sequence index and an expanded glossary. What is important is that we don’t have any content there that is not taken almost directly from unambiguously upvoted posts that were promoted on the front page. If there is anything there that isn’t just a summarized version of a post for the purpose of easy linking then it does not belong.
Then someone (if not you) needs to set out, in detail, what the desired vision for the wiki is. Because at present, it’s entirely unclear what you intend it to be useful for. Saying “no that’s not it, I don’t like that, try again” is not helpful.
Do they care?
You’re making an argument toward shutting it down, as the wrong tool for the job (that you haven’t specified).
Yes, that’s precisely what I’m saying: it will get worse as part of the process of getting to better.
My case is that that’s how wikis work generally. As such, if you disagree, you need to say how this one would come to life by some other process (what that process is and preferably some examples to point to).
I’m not sure what Vladimir’s vision is—and this is more or less his baby—but the wiki is most useful as a way to reduce inferential distance for newcomers by serving as a reference for jargon and cached thoughts.
It does not need to go into any detail. All significant content should be posted in the form of blog posts where it can (, will and probably already has been) be discussed in depth.
No lumps allowed! Sparsity and incompleteness are to be preferred to lumps. The wiki does not need to be complete. A standalone wiki cannot afford to be lacking in content. A LW wiki can. Because this is a blog, not a wiki. There is plenty of content here. If all the ‘wiki’ did was have a dozen pages with indexes and a few references then it would still be serving a purpose.
I think that the big difference between David’s viewpoint and yours is that he views a wiki as a living, growing thing. The trouble with your slogans above is that they effectively become:
Did you really mean to make these slogans so strong?
Not really. I view it as a living thing with higher standards and without a willingness to sacrifice quality for growth.
You are toeing a line here between inappropriate and disingenuous. Not only are my assertions of preference not slogans I was only reluctantly going along with David’s ‘lumps’ metaphor because there were more important things to criticize than an awkward description.
You then proceed to overtly misquote me, adding words that change the meaning to something I quite obviously did not intend. Following up with “Did you really mean to make these slogans so strong?” just strikes the logical rudeness home.
Aside from not being what I referred to this does not accurately represent the kind of system that David was describing either.
No, actually it does. There will always be lumps, but any given lump will be temporary.
LW wiki is not a healthy wiki, it’s important to keep this in mind when making decisions about it.
Yes, exactly. This is why we have to start even lumpier and smellier. What I mean is that that will never go away.
Cargo cult is not your friend. Just because healthy wikis have this workflow, doesn’t mean that bringing it here will make our wiki healthy. We need an understanding of how that would incite new editors. I expect that it won’t, and if new editors ever appear, it will happen for other reasons.
It’s experience, not cargo culting. There is a difference.
What brings new editors to a wiki is the wiki, and contributing to it, being personally useful to them. Then the wiki process of collaborative editing kicks in.
(e.g. I edit it to create pages I find personally useful to link to concerning LW concepts, for example. I expect others to have other uses.)
I’m not sure that the model is intended to predict human behaviour? (in the LW worldview at least)
It’s odd—I don’t see defining a utility function for myself as especially hard. Looking at it dispassionately, I want food, access to new information, ability to make my own decisions, shelter, sex (in roughly that order of importance, with food at the top) up to the maximum levels I can comfortably cope with those things, and as much assurance as possible that I can continue to get those things. I want to continue to be supplied those things indefinitely, and to avoid pain. I also want to see as many other utility functions fulfilled as possible—but where a conflict comes up between two different utility functions, I would give preferential treatment to the utility functions of beings who are cognitively closer to me than to those further away.
This means that all else being equal, I would rather see everyone get maximum utilons, but if it’s a choice between satisfying the utility function of a liberal music-loving science fiction fan or that of a conservative religious fundamentalist who wants to ban all non-worship music, I’d choose the former over the latter. But I’d choose the religious fundamentalist over a monkey, a monkey over a dog, a dog over a slug, and a slug over a lump of rock. And were there a way to make the rock happier (should such a concept even make sense) without disadvantaging any of the things higher in the list, I’d want to see that happen too.
I also have a few people whose happiness I rate at least as highly as mine (my wife and my parents are the only ones who wouldn’t necessarily be covered by the ‘cognitively similar’ part) and so my own utility function would give ‘undue’ weight to them, but oherwise seems fairly simple.
Thank you for sharing. As you can see, it seems to be a minority who consider (the broad shape of) their utility function to be both easily accessible and worth declaring publicly.