Topics to discuss CEV

diegocaleiro6 Jul 2011 14:19 UTC

8 points

CEV is our current proposal for what ought to be done once you have AGI flourishing around. Many people have had bad feelings about this. When in Singularity Institute, I decided to write a text do discuss CEV, from what it is for, to how likely it is to achieve it’s goals, and how much fine-grained detail needs to be added before it is an actual theory.

Here you find a draft of the topics I’ll be discussing in that text. The purpose of showing this is that you take a look at the topics, spot something that is missing, and write a comment saying: “Hey, you forgot this problem, which, summarised, is bla bla bla bla” and also “be sure to mention paper X when discussing topic 2.a.i,”

Please take a few minutes to help me add better discussions.

Do not worry about pointing previous Less Wrong posts about it, I have them all.

Summary of CEV
Troubles with CEV
1. Troubles with the overall suggestion
  1. Concepts on which CEV relies that may not be well shaped enough
2. Troubles with coherence
  1. The volitions of the same person when in two different emotional states might be different—it’s as if they are two different people. Is there any good criteria by which a person’s “ultimate” volition may be determined? If not, is it certain that even the volitions of one person’s multiple selves will be convergent?
  2. But when you start dissecting most human goals and preferences, you find they contain deeper layers of belief and expectation. If you keep stripping those away, you eventually reach raw biological drives which are not a human belief or expectation. (Though even they are beliefs and expectations of evolution, but let’s ignore that for the moment.)
  3. Once you strip away human beliefs and expectations, nothing remains but biological drives, which even the animals have. Yes, an animal, by virtue of its biological drives and ability to act, is more than a predicting rock, but that doesn’t address the issue at hand.
3. Troubles with extrapolation
  1. Are small accretions of inteligence analogous to small accretions of time in terms of identity? Is extrapolated person X still a reasonable political representant of person X?
4. Problems with the concept of Volition
  1. Blue eliminating robots (Yvain post)
  2. Error minimizer
  3. Goals x Volitions
5. Problems of implementation
  1. Undesirable solutions for hardware shortage, or time shortage (the machine decides to only CV, but not E)
  2. Sample bias
  3. Solving apparent non-coherence by meaning shift
Praise of CEV
1. Bringing the issue to practical level
2. Ethical strenght of egalitarianism

Alternatives to CEV
1. ( )
2. ( )
3. Normative approach
4. Extrapolation of written desires

Solvability of remaining problems
1. Historical perspectives on problems
2. Likelihood of solving problems before 2050
3. How humans have dealt with unsolvable problems in the past

What links here?

shminux's comment on Would you like to give me feedback for “Troubles With CEV” by diegocaleiro (24 Dec 2011 23:55 UTC; 0 points)

diegocaleiro6 Jul 2011 14:19 UTC

8 points

13 comments2 min readLW link Archive

Coherent Extrapolated Volition

TimFreeman 6 Jul 2011 21:07 UTC
11 points
An alternative to CEV is CV, that is, leave out the extrapolation.

You have a bunch of non-extrapolated people now, and I don’t see why we should think their extrapolated desires are morally superior to their present desires. Giving them their extrapolated desires instead of their current desires puts you into conflict with the non-extrapolated version of them, and I’m not sure what worthwhile thing you’re going to get in exchange for that.

Nobody has lived 1000 years yet; maybe extrapolating human desires out to 1000 years gives something that a normal human would say is a symptom of having mental bugs when the brain is used outside the domain for which it was tested, rather than something you’d want an AI to enact. The AI isn’t going to know what’s a bug and what’s a feature.

There’s also a cause-effect cycle with it. My future desires depend on my future experiences, which depend on my interaction with the CEV AI if one is deployed, so the CEV AI’s behavior depends on its estimate of my future desires, which I suppose depends on its estimate of my future experiences, which in turn depends on its estimate of its future behavior. The straightforward way of estimating that has a cycle, and I don’t see why the cycle would converge.

The example in the CEV paper about Fred wanting to murder Steve is better dealt with by acknowledging that Steve wants to live now, IMO, rather than hoping that an extrapolated version of Fred wouldn’t want to commit murder.

ETA: Alternatives include my Respectful AI paper, and Bill Hibbard’s approach. IMO your list of alternatives should include alternatives you disagree with, along with statements about why. Maybe some of the bad solutions have good ideas that are reusable, and maybe pointers to known-bad ideas will save people from writing up another instance of an idea already known to be bad.

IMO, if SIAI really wants the problem to be solved, SIAI should publish a taxonomy of known-bad FAI solutions, along with what’s wrong with them. I am not aware that they have done that. Can anyone point me to such a document?
DanielVarga 6 Jul 2011 22:03 UTC
7 points
You say you are aware of all the relevant LW posts. What about LW comments? Here are two quite insightful ones:
- Marcello’s comment about extrapolation, with an interesting short Wei Dai-EY debate below it.
- XiXiDu’s recent comment about the context-dependence of preferences.
My most easily articulated problem with CEV is mentioned in this comment, and can be summarized with the following rhetorical question: What if “our wish if we knew more, thought faster, were more the people we wished we were” is to cease existing (or to wirehead)? Can we prove in advance that this is impossible? If we can’t get a guarantee that this is impossible, does that mean that we should accept wireheading as a possible positive future outcome?

EDIT: Another nice short comment by Wei Dai. It is part of a longer exchange with cousin_it.
jsalvatier 6 Jul 2011 15:20 UTC
5 points
I don’t think it’s correct to say CEV is ‘our current proposal for …’ for two reasons
1. Anthropomorphizing groups is not generally a good idea.
2. From what I gather it’s more of a ‘wrong/incomplete proposal useful for communicating strong insights’.
My understanding is very superficial, though, so I may be mistaken.
- Manfred 6 Jul 2011 15:53 UTC
  0 points
  Parent
  Agreed. CEV is very fuzzy goal, any specific implementation in terms of an AI’s models of human behavior (e.g. dividing human motivation into moral/hedonistic and factual beliefs with some learning model based on experience, then acting on average moral/hedonistic beliefs with accurate information) has plenty of room to fail on the details. But on the other hand, it’s still worth it to talk about whether the fuzzy goal is a good place to look for a specific implementation, and I think it is.
Emile 6 Jul 2011 15:27 UTC
4 points
Are you writing this on behalf of the SIAI (or visiting fellows)?

(This is a honest question, there’s no clear indication of which LW posters are SIAI members/visiting fellows; you say you were in the singularity institute but I can’t tell if this is “I left months ago but have still been talking about the subject” or “I’m still there and this is a summary of our discussions” or something else)
- diegocaleiro 6 Jul 2011 15:40 UTC
  3 points
  Parent
  I was there as a visiting fellow, and decided my time there would be best served getting knowledge from people, and my time once back to Brazil would be best spent actually writing and reading about CEV.
endoself 6 Jul 2011 15:40 UTC
3 points

Blue eliminating robots (Alicorn post)

That post was by Yvain.

As an aside, I don’t think he has fully explained his point yet; it may be better not to write that section until he is done that sequence.
[deleted] 7 Jul 2011 3:37 UTC
2 points
How will the AI behave when it is still gathering information and computing the CEV (or any other meta-level solution)? For example, in the case of CEV, won’t it pick the most efficient, not the rightest, method to scan brains, compute the CEV, etc?

Do we (need to) know what mechanism or knowledge the AI would need to approximate ethical behavior when it still doesn’t know exactly what friendliness means?
- jsalvatier 7 Jul 2011 21:03 UTC
  0 points
  Parent
  An excellent point.
AlexMennen 6 Jul 2011 16:47 UTC
1 point
Alternatives to CEV
```
   Normative approach
   Extrapolation of written desires
```
While CEV is rather hand-wavy, if the only alternatives we can think of are all this bad, then trying to make CEV work is probably the best approach.
- diegocaleiro 9 Jul 2011 10:10 UTC
  0 points
  Parent
  yes, That seems to me to be how sucky we are at this right now. That is why I think writing about this is my relative advantage as a philosopher at the moment.
  
  Please oh, please, suggest more alternatives people!
Wei Dai 6 Jul 2011 18:42 UTC
0 points
Could you list the previous discussions of CEV that you already have? I ask because you don’t seem to mention the problem with CEV described in this post.

ETA: Also this post gives another reason why coherence may not occur.
- diegocaleiro 6 Jul 2011 20:57 UTC
  1 point
  Parent
  I don’t think it would be useful to list all of them here, but everything labeled CEV in Less Wrong Search, and probably at least the first 30 google searches (including blogs, random comments, article like texts such as Goertzel’s, Tartletons… Anissimov’s discussion.
  
  And yes, I have read your text and will be considering the problems it describes. Thanks for the concern