In favour of a selective CEV initial dynamic

[deleted]21 Oct 2011 17:33 UTC

16 points

Note: I appreciate that at this point CEV is just a sketch. However, it’s an interesting topic and I don’t see that there’s any harm in discussing certain details of the concept as it stands.

1. Summary of CEV

Eliezer Yudkowsky describes CEV—Coherent Extrapolated Volition – here. Superintelligent AI is a powerful genie, and genies can’t be trusted; Friendly AI requires the AI to take as input the entire value computation of at least one human brain, because the failure to take into consideration a relatively small element of the human value set, even whilst optimising in several other respects, is likely to be a disaster. CEV is Yudkowsky’s attempt at outlining a Friendly AI volition-extrapolating dynamic: a process in which the AI takes human brainstates, combines this with its own vast knowledge, and outputs suitable actions to benefit humans.

Note that extrapolating volition is not some esoteric invention of Eliezer’s; it is a normal human behaviour. To use his example: we are extrapolating Fred’s volition (albeit with short distance) if given two boxes A and B only one of which contains a diamond that Fred desires, we give him box B when he has asked us to give him box A, on the basis that he incorrectly believes that box A contains the diamond whereas we know that in fact it is in box B.

Yudkowsky roughly defines certain quantities that are likely to be relevant to the functioning of the CEV dynamic:

Spread describes the case in which the extrapolated volition is unpredictable. Quantum randomness or other computational problems may make it difficult to say with strong confidence (for example) whether person A would like to be given object X tomorrow – if the probability computed is 30%, rather than 0.001%, there is significant spread in this case.

Muddle is a measure of inconsistency. For example person A might resent being given object Y tomorrow, but also resent not being given object Y if it isn’t given to him tomorrow.

Distance measures the degree of separation between one’s current self and the extrapolated self, i.e. how easy it would be to explain a given instance of extrapolated volition to someone. In the case of Fred and the diamond the distance is very short, but superintelligent AI could potentially compute Fred’s extrapolated volition to such a distance that it seems incomprehensible to Fred.

To quote Yudkowsky (I assume that the following remains approximately true today):

As of May 2004, my take on Friendliness is that the initial dynamic should implement the coherent extrapolated volition of humankind.

In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.

Yudkowsky adds that “it should be easier to counter coherence than to create coherence” where coherence refers to strong, un-muddled and un-spread agreement between multiple individual volitions with no strong disagreement from any others; and that “the initial dynamic for CEV should be conservative about saying ‘yes’ and listen carefully for ‘no’” – the superintelligent optimisation process should seek more consensus before steering humanity into narrow slices of the future, relative to the degree of consensus it needs before steering humanity away from some particular narrow slice of the future (about which it has been warned by elements of the CEV).

CEV is an initial dynamic; it doesn’t necessarily have to be the perfect dynamic of human volition for Friendly AI, but the dynamic should be good enough that it allows the AI to extrapolate an optimal dynamic of volition to which we can then switch over if desirous. “The purpose of CEV as an initial dynamic is not to be the solution, but to ask what solution we want”.

Also, “If our extrapolated volitions say we don’t want our extrapolated volitions manifested, the system replaces itself with something else we want, or else...undergoes an orderly shutdown”.

Finally, Yudkowsky suggests that as a safeguard, a last judge of impeccable judgement could be trusted with putting the seal of approval on the output of the CEV dynamic; if something seems to have gone horribly wrong, beyond mere future shock, he can stop the output from being enacted.

2. CEV of all humankind vs. CEV of a subset of humankind

Let us accept that coherent extrapolated volition, in general, is the best (only?) solution that anyone has provided to the problem of AI friendliness. I can see four ways of implementing a CEV initial dynamic:

Implement a single CEV dynamic incorporating all humans, the output of which affects everyone.
Implement an individual CEV dynamic for each individual human.
Implement a single CEV dynamic incorporating one human only, the output of which affects everyone.
Implement a single CEV dynamic incorporating a limited subset of humans, the output of which affects everyone.

As Yudkowsky discusses in his document, whilst the second option might perhaps be a reasonable final dynamic (who knows?) it isn’t a suitable initial dynamic. This is because if there is more than one CEV running, the way in which the CEV dynamic works in a general sense cannot be re-written without someone’s individual CEV being violated, and the idea behind the initial dynamic is that a superior dynamic may develop from it.

The third option is obviously sub-optimal, because of the danger that any individual person might be a psychopath – a person whose values are in general markedly hostile to other humans. Knowing more and thinking smarter might lead a given psychopath’s more humane values to win out, but we can’t count on that. In a larger group of people, the law of large numbers applies and the risk diminishes.

Yudkowsky favours the first option, a CEV of all humankind; I am more in favour of the fourth option, an initial CEV dynamic incorporating the minds of only a certain subset of humans. It would like to compare these two options on six relevant criteria:

I Schelling points [edit: apologies for the questionable use of a game theory term for the sake of concision]

Clearly, incorporating the minds of all humankind into the intial dynamic is a Schelling point – a solution that people would naturally generate for themselves in the absence of any communication. So full marks to a universal CEV on this criterion.

Answer quickly: what specific group of people – be that a group of people who meet each other regularly, or a group who are distinguished in some other way – would you nominate, if you had to choose a certain subset of minds to participate in the initial dynamic?

What springs to my mind is Nobel Prize winners, and I suspect that this too is a Schelling point. This seems like a politically neutral selection of distinguished human beings (particularly if we exclude the Peace Prize) of superlative character and intellect. Whether some people would object strongly to this selection is one question, but certainly I expect that many humans, supposing they were persuaded for other reasons that the most promising initial dynamic is one incorporating a small group of worthy humans only, would consider Nobel Prize winners to be an excellent choice to rally around.

Many other groups of minds, for example the FAI programming team themselves, would of course seem too arbitrary to gather sufficient support for the idea.

II Practicality of implementation

One problem with a universal CEV that I have never seen discussed is how feasible it would actually be to take extremely detailed recordings of the brain states of all of the humans on Earth. All of the challenges involved in creating Friendly AI are of course extreme. But ceteris paribus, one additional extremely challenging problem is one too many.

A prerequisite for the creation of superintelligent AI must surely be the acquisition of detailed knowledge of the workings of the human brain. However, our having the ability to scan one human brain in extreme detail does not imply that it is economically feasible to scan 7 billion or more human brains in the same way. It might well come to pass that the work on FAI is complete, but we still lack the means to actually collect detailed knowledge of all existing human minds. A superintelligent AI would develop its own satisfactory means of gathering information about human brains with minimal disruption, but as I understand the problem we need to input all human minds into the AI before switching it on and using it to do anything for us.

Even if the economic means do exist, consider the social, political and ideological obstacles. How do we deal with people who don’t wish to comply with the procedure?

Furthermore, let us suppose that we manage to incorporate all or almost all human minds into the CEV dynamic. Yudkowsky admits the possibility that the thing might just shut itself down when we run it – and he suggests that we shouldn’t alter the dynamic too many times in an attempt to get it to produce a reasonable-looking output, for fear of prejudicing the dynamic in favour of the programmers’ preferences and away from humanity’s CEV.

It would be one thing if this merely represented the (impeccably well-intentioned) waste of a vast amount of money, and the time of some Nobel Prize winners. But if it also meant that the economic, political and social order of the entire world had been trampled over in the process of incorporating all humans into the CEV, the consequences could be far worse. Enthusiasm for a second round with a new framework at some point in the future might be rather lower in the second scenario than in the first.

III Safety

In his document on CEV, Yudkowsky states that there is “a real possibility” that (in a universal CEV scenario) the majority of the planetary population might not fall into a niceness attractor when their volition is extrapolated.

The small group size of living scientific Nobel Prize winners (or any other likely subset of humans) poses certain problems for a selective CEV that the universal CEV lacks. For example, they might all come under the influence of a single person or ideology that is not conducive to the needs of wider humanity.

On the other hand, given their high level of civilisation and the quality of character necessary for a person to dedicate his life to science, ceteris paribus I’d be more confident of Nobel Prize winners falling into a niceness attractor in comparison to a universal CEV. How much trust are we willing to place in the basic decency of humankind – to what extent is civilisation necessary to create a human who would not be essentially willing to torture innocent beings for his own gratification? Perhaps by the time humanity is technologically advanced enough to implement AGI we’ll know more about that, but at our current state of knowledge I see little reason to give humans in general the benefit of the doubt.

Yudkowsky asks, “Wouldn’t you be terribly ashamed to go down in history as having meddled...because you didn’t trust your fellows?” Personally, I think that shutting up and multiplying requires us to make our best estimate of what is likely to benefit humankind (including future humans) the most, and run with that. I’d not be ashamed if in hindsight my estimate was wrong, since no-one can be blamed for having imperfect knowledge.

IV Aesthetic standards

In his document, Yudkowsky discusses the likelihood of certain volitions cancelling one another out whilst others add together; metaphorically speaking, “love obeys Bose-Einstein statistics while hatred obeys Fermi-Dirac statistics”. This supports the idea that extrapolating volition is likely to produce at least some useful output – i.e. having minimal spread and muddle, ideally at not too far a distance.

In a universal CEV this leads us to believe that Pakistani-Indian mutual hatred, for example, cancels out (particularly since coherence is easier to counter than to create) whereas their mutual preferences form a strong signal.

The problem of aesthetic standards concerns the quality of the signal that might cohere within the CEV. Love seems to be a strong human universal, and so we would expect love to play a strong role in the output of the initial dynamic. On the other hand, consider the difference in intelligence and civilisation between the bulk of humanity and a select group such as Nobel Prize winners. Certain values shared by such a select group, for example the ability to take joy in the merely real, might be lost amidst the noise of the relatively primitive values common to humanity as a whole.

Admittedly, we can expect “knowing more” and “growing up farther together” to improve the quality of human values in general. Once an IQ-80 tribesman gains more knowledge and thinks faster, and is exposed to rational memes, he might well end up in exactly the same place as the Nobel Prize winners. But the question is whether it’s a good idea to rely on a superb implementation of these specifications in an initial dynamic, rather than taking out the insurance policy of starting with substantially refined values in the first place – bearing in mind what is at stake.

A worst case scenario, assuming that other aspects of the FAI implementation work as planned, is that the CEV recommends an ignoble future for humanity – for example orgasmium – which is not evil, but is severely lacking in aesthetic qualities that might have come out of a more selective CEV. Of course, the programmers or the last judge should be able to veto an undesirable output. But if (as Yudkowsky recommends) they only trust themselves to tweak the dynamic a maximum of three times in an effort to improve the output before shutting it off for good if the results are still deemed unsatisfactory, this does not eliminate the problem.

V Obtaining a signal

It seems to me that the more muddle and spread there is within the CEV, the greater the challenge that exists in designing an initial dynamic that outputs anything whatsoever. Using a select group of humans would ensure that these quantities are minimised as far as possible. This is simply because they are likely to be (or can be chosen to be) a relatively homogeneous group of people, who have relatively few directly conflicting goals and possess relatively similar memes.

Again, why make the challenge of FAI even more difficult than it needs to be? Bear in mind that failure to implement Friendly AI increases the likelihood of uFAI being created at some point.

VI Fairness

In his document on CEV, Yudkowsky does go some way to addressing the objections that I have raised. However, I do not find him persuasive on this subject:

Suppose that our coherent extrapolated volition does decide to weight volitions by wisdom and kindness – a suggestion I strongly dislike, for it smacks of disenfranchisement. It don’t think it wise to tell the initial dynamic to look to whichever humans judge themselves as wiser and kinder. And if the programmers define their own criteria of “wisdom” and “kindness” into a dynamic’s search for leaders, that is taking over the world by proxy. You wouldn’t want the al-Qaeda programmers doing that, right?

Firstly, the question of disenfranchisement. As I suggested earlier, this constitutes a refusal to shut up and multiply when dealing with a moral question. “Disenfranchisement” is a drop in the ocean of human joy and human suffering that is at stake when we discuss FAI. As such, it is almost completely irrelevant as an item of importance in itself (of course there are other consequences involved in the choice between universal CEV and a degree of disenfranchisement – but they have been discussed already, and are beside the point of the strictly moral question.) This is especially the case since we are only talking about the initial dynamic here, which may well ultimately develop into a universal CEV.

Secondly, there is the mention of al-Qaeda. In the context of earlier mentions of al-Qaeda programmers in the document on CEV, Yudkowsky appears to be positing a “veil of ignorance” – we should behave in creating the FAI as we would want al-Qaeda programmers to behave. This is strange, because in a similar veil of ignorance problem – the modesty argument – Robin Hanson argued that we should act as though there is a veil of ignorance surrounding whether it is ourselves or someone else who is wrong in some question of fact, whereas Eliezer argued against the idea.

Personally I have little regard for veil of ignorance arguments, on the basis that there is no such thing as a veil of ignorance. No, I would not want the al-Qaeda programmers to nominate a group of humans (presumably Islamic fanatics) and extrapolate their volition – I would rather they used all of humanity. But so what? I am quite happy using my own powers of judgement to decide that al-Qaeda’s group is inferior to humanity as a whole, but Nobel Prize winners (for example) are a better choice than humanity as a whole.

As for “taking over the world by proxy”, again SUAM applies.

3. Conclusion

I argue that a selective CEV incorporating a fairly small number of distinguished human beings may be preferable to a CEV incorporating all of humanity. I argue that the practical difficulty of incorporating all humans into the CEV in the first place is unduly great, and that the programming challenge is also made more difficult by virtue of this choice. I consider any increase in the level of difficulty in the bringing into existence of FAI to be positively dangerous, on account of the fact that this increases the window of time available for unscrupulous programmers to create uFAI.

Setting aside the problem of getting the initial dynamic to work at all, I also consider it to be possible for the output of a selective CEV to be more desirable to the average human than the output of a universal CEV. The initial dynamic is the creation of human programmers, who are fallible in comparison to a superintelligent AI; their best attempt at creating a universal CEV dynamic may lead to the positive values of many humans being discarded, lost in the noise.

In other words, the CEV initial dynamic shouldn’t be regarded as discovering what a group of people most desire collectively “by definition”—it is imperfect. If a universal CEV implementation is more difficult for human programmers to do well than a selective CEV, then a selective CEV might not only extrapolate the desires of the group in question more accurately, but also do a better job of reflecting the most effectively extrapolated desires of humanity as a whole.

Furthermore, desirability of the CEV ouput to the average human in existence today should be weighed against the desires of (for example) sentient human uploads created in a post-singularity scenario. Shutting up and multiplying demands that FAI programmers and other people of influence set aside concerns about being “jerks” when estimating the probability that extrapolating the volition of humanity en masse is the best way of meeting their own moral standards.

What links here?

[deleted]'s comment on Criticisms of intelligence explosion by lukeprog (22 Nov 2011 23:16 UTC; 14 points)

[deleted]21 Oct 2011 17:33 UTC

16 points

114 comments11 min readLW link Archive

Coherent Extrapolated Volition

steven0461 21 Oct 2011 19:57 UTC
18 points
0
Nobel prize winners differ from ordinary folk mostly in their smarts, but CEV already asks what we’d think if we were smarter. I don’t see any reason to think doing great science is strongly correlated with moral character, and if you were looking to select for moral character, I’m sure there’d be better Schelling points to aim for.
- Viliam_Bur 22 Oct 2011 14:14 UTC
  −10 points
  0
  Parent
  
  if you were looking to select for moral character, I’m sure there’d be better Schelling points to aim for.
  
  My guess: Pope and Dalai Lama and Eliezer Yudkowsky… but without Eliezer.
  - Normal_Anomaly 23 Oct 2011 0:02 UTC
    2 points
    0
    Parent
    Are you saying that a lot of people would hit on “Pope and Dalai Lama,” for the initial extrapolation, or that that would be a good idea? Because they would hit on it, and it wouldn’t be a good idea.
    - Viliam_Bur 23 Oct 2011 10:11 UTC
      3 points
      0
      Parent
      
      Are you saying that a lot of people would hit on “Pope and Dalai Lama,” for the initial extrapolation, or that that would be a good idea?
      
      Schelling point is a solution that people can agree on without any prior communication, just based on their general knowledge.
      
      Imagine that at this very moment there is an alien invasion on Earth. Aliens are generally benevolent, they want to give us technology and whatever, because they are incredibly advanced. They want to speak with the “morality spokesman” of the humankind… and are deeply horrified that we have no such official function. For a short moment they contemplate the possibility of exterminating this immoral species, but then they decide to give us a benefit of doubt—even if we have no official global morality and no official global morality leaders, we obviously are capable of moral thinking, and our individual moralities are significantly correlated, so perhaps they could help humanity make the next necessary step.
      
      All humans are put into bubbles, so they cannot communicate with each other. Then everyone must say who in their opinion is the “morality spokesman” of humankind (it can be an individual or a group with less than thousand members). The majority vote wins. Then everyone, who did not vote for the winner, is killed. At this moment, the “morality spokesman” will receive the technology from aliens and will decide how humanity should use it.
      
      Perhaps you, as a rational person, have serious doubts about this whole process. You don’t understand why humanity should have one or few “morality spokesmen”, why they should be selected by this kind of voting, or why everyone else should die. I agree with you; but we don’t make the rules, aliens do. At this moment you try to survive the election. Who would you vote for?
      
      (Part of this story is a metaphor for super-human AI. The part about killing those who disagree with majority votes, serves to illustrate the Schelling point—you are supposed to answer the question not as you want, but as you guess other people will. Some of them will give a honest answer, but many will try to survive just as you do.)
      
      I suspect that (at least in a Western world) “Pope” and “Dalai Lama” would be the most frequent answers. If you disagree, say your candidates. (Note: You cannot vote for Mother Theresa, she is dead.)
      - Normal_Anomaly 23 Oct 2011 15:23 UTC
        3 points
        0
        Parent
        I agree with what you’re saying here, that if my goal was to survive I would pick the Pope. Though I’m not sure how much I’d want to live in a world based off the Pope’s EV. Also, I think the whole point is moot, because the FAI programmers don’t have to pick a Schelling point. They can pick Universal, or a random sample, or call for volunteers, or call for volunteers with some screening test to get rid of sociopaths.
        
        I think we can agree on what I said in the grandparent: the pope would be the biggest one-person schelling point, and it’s not a good choice for initial dynamic.
        [deleted] 24 Aug 2012 6:26 UTC
        1 point
        0
        Parent
        
        I think we can agree on what I said in the grandparent: the pope would be the biggest one-person schelling point, and it’s not a good choice for initial dynamic.
        
        Actually it might not be that bad. The theology of the body thing they have going might however mess up transhumanist aspirations I have and that would suck. But otherwise I expect a world mostly free of disease and poverty with much longer (but perhaps finite) lifespans where Western traditional values are given a boost.
        
        That’s pretty close to utopia when considering most other outcomes. Certainly it would be a more pleasant place to live than Robin Hansons em world.
        Viliam_Bur 24 Oct 2011 19:01 UTC
        1 point
        0
        Parent
        
        the pope would be the biggest one-person schelling point
        
        Yes, exactly.
        
        and it’s not a good choice for initial dynamic
        
        And therefore, choosing a Schelling point for morality as base of CEV is probably not as good idea as it may seem. Unless one believes that ten-person or hundred-person Schelling points for morality would bring dramatically different results.
        
        (And this is basically what I was trying to express in the comment that got so many negative points. Pope could be a Schelling point, Dalai Lama could be a Schelling point… Eliezer Yudkowsky would be a Schelling point inside LW community, but not outside.)
      - Dar_Veter 23 Oct 2011 12:31 UTC
        2 points
        0
        Parent
        
        Aliens are generally benevolent
        
        How would malevolent aliens behave? :-P
        
        I suspect that (at least in a Western world) “Pope” and “Dalai Lama” would be the most frequent answers.
        
        “Western world” is small portion of mankind and, in this scenario, all mankind counts. I cannot see even one Western person out of hundred remember Dalai Lama when facing death and for the rest of the world, the few who heard about him (excepting Tibetan Buddhists) would not appreciate his morality in the slightest.
        
        My vote goes to the Pope—Roman Catholics are the largest religous group worldwide. The result of your gedankenexperiment is fully Catholic world and Crusade decared against the alien scum.
        steven0461 23 Oct 2011 21:22 UTC
        4 points
        0
        Parent
        
        The result of your gedankenexperiment is fully Catholic world and Crusade decared against the alien scum.
        
        It’s extrapolated volition that matters, not current volition. If the Pope had the same beliefs about facts that we do, his most important difference with most of us might well be something like old age.
        [deleted] 24 Aug 2012 6:15 UTC
        0 points
        0
        Parent
        In the thought experiment I would also likely vote pope since he seems by far the most likely candidate to win and also would not be a moral leader so bad that I wouldn’t want to live in that world.
        
        The result of your gedankenexperiment is fully Catholic world
        
        Not actually true, I’m sure lots of educated people would make the guess that the pope is likely to win the election and vote the same way. I’m also pretty sure many non-Catholic Christians might decide he is the best pick likely to win.
        
        I’m also pretty sure almost instantly after the calamity lots of humans would start worshipping the aliens.
        
        and Crusade decared against the alien scum.
        
        Unlikely to happen because of how suicidal that would be and that most Popes being intelligent people would realize this and would I think encourage the “turn the other cheek” memes to deal with the grief and outrage. However a few billion deaths might animate mankind aware of the cost in powerful and difficult to control ways.
        Viliam_Bur 24 Oct 2011 19:09 UTC
        0 points
        0
        Parent
        
        How would malevolent aliens behave?
        
        They would kill us, or worse, without giving us any chance.
        
        Just like a super-human AI without design for friendliness will probably kill us, or worse. An AI designed for friendliness will need some choices from us—for example whether to use CEV of humankind, and how to approximate it if we can’t measure literally every person on the planet—and a bad choice could have horrible consequences.
Jack 21 Oct 2011 19:10 UTC
11 points
0
Most practicality concerns could be addressed by taking a subset through lottery. The argument for Nobel prize winners seems to rest on the programming difficulty point.

I think you underestimate the possibility of serious things going wrong when taking the CEV of a demographically, neurologically and ideologically similar group with unusually large egos.
- [deleted] 21 Oct 2011 19:22 UTC
  1 point
  0
  Parent
  
  Most practicality concerns could be addressed by taking a subset through lottery.
  
  Problems that would remain are actually counting all of the humans on Earth (there are still uncontacted tribes in existence), and ensuring that they comply. Perhaps we could amend your proposal to state that we could have a lottery in which all humans who wish to be participate are free to apply.
  
  A compromise with the idea of a selective CEV would be if a select group of distinguished humans were to be supplemented by random humans chosen via the lottery, or vice versa. Personally, as I said I think that “shut up and multiply” demands the fuzzies derived from being inclusive take a back seat to whatever initial grouping is most likely to produce viable, safe and aesthetically satisfactory output from the CEV dynamic.
  - Jack 21 Oct 2011 19:34 UTC
    5 points
    0
    Parent
    If it were just a matter of fuzzies I would agree with you, but I’m worried about the resulting output being unfriendly to subsets of the world that get left out. Maybe we think the algorithm would identify and extrapolate only the most altruistic desires of the selected individuals—but if that’s the case it is correspondingly unlikely that choosing such a narrow subset would make the programming easier.
    
    Edit: This argument over the ideal ratio of enfranchisement to efficiency is an ancient one in political philosophy. I’m willing to accept that it might be impractical to attain full representation—maybe uncontacted tribes get left out. Rule by the CEV of Nobel prize winners is likely preferable to death but is still suboptimal in the same way that living in a Hobbesian monarchy is worse than living in Rousseau’s ideal state.
    - [deleted] 21 Oct 2011 21:36 UTC
      2 points
      0
      Parent
      In the universal CEV, there is indeed the benefit that no group of humans or individual human (although future humans, e.g. uploads, are a different matter) is without a voice. On the other hand, this only guarantees that the output is not unfriendly to any group or person if the output is very sensitive to the values of single people and small groups. In that case, as I said it seems that the programmers would be more likely to struggle to create a dynamic that actually outputs anything, and if it does output anything it is relatively likely to be disappointing from an aesthetic perspective. That is to say, I don’t see the inclusion of everyone in the CEV as providing much guarantee that the output will friendly to everyone, unless the dynamic is so sensitive to individuals who counter coherence that it outputs nothing or almost nothing at all.
      
      It seems then that in either case—universal CEV or selective CEV—the benevolence of the output depends on whether knowing more, thinking faster and growing up closer together, the extrapolated values of the humans in question will actually be benevolent towards others.
      
      Yudkowksy states that failure to fall into a niceness attractor is a significant possibility, and I am inclined to agree. And it seems to me that to maximise the chances of the CEV output being located in a niceness attractor, we should start from a strong position (humans with nicer-than-average character and great intellect) so we are not relying too much on the programmers having created a totally ideal volition-extrapolating dynamic with perfect implementation of “growing up together” etc.
wedrifid 21 Oct 2011 19:49 UTC
10 points
0

In other words, the CEV initial dynamic shouldn’t be regarded as discovering what a group of people most desire collectively “by definition”—it is imperfect. If a universal CEV implementation is more difficult for human programmers to do well than a selective CEV, then a selective CEV might not only extrapolate the desires of the group in question more accurately, but also do a better job of reflecting the most effectively extrapolated desires of humanity as a whole.

I am wary of using arguments along the lines of “CEV is better for everyone than CEV”. If calculating based on a subset happens to be the most practical instrumentally useful hack for implementing CEV then an even remotely competent AI can figure that out itself.

I would still implement the CEV option but I’d do it for real reasons.
What links here?
- wedrifid's comment on In favour of a selective CEV initial dynamic by [deleted] (22 Oct 2011 13:48 UTC; 2 points)
- Jack 21 Oct 2011 19:53 UTC
  1 point
  0
  Parent
  That something you want to say in public?
  - wedrifid 21 Oct 2011 19:56 UTC
    5 points
    0
    Parent
    
    That something you want to say in public?
    
    Yes. I really don’t want the volition of psychopaths, suicidal fanatics and jerks in general to be extrapolated in such a way as it could destroy all that I hold dear. Let this be my solemnly sworn testimony made public where all can see. Allow me (wedrifid_2011) to commit to my declaration of my preferences as of the 21st of October by requesting that you quote me, leaving me unable to edit it away.
    - arundelo 21 Oct 2011 21:40 UTC
      8 points
      0
      Parent
      wedrifid wrote:
      
      I am wary of using arguments along the lines of “CEV is better for everyone than CEV”. If calculating based on a subset happens to be the most practical instrumentally useful hack for implementing CEV then an even remotely competent AI can figure that out itself.
      
      I would still implement the CEV option but I’d do it for real reasons.
      
      Jack wrote:
      
      That something you want to say in public?
      
      wedrifid wrote:
      
      Yes. I really don’t want the volition of psychopaths, suicidal fanatics and jerks in general to be extrapolated in such a way as it could destroy all that I hold dear. Let this be my solemnly sworn testimony made public where all can see. Allow me (wedrifid_2011) to commit to my declaration of my preferences as of the 21st of October by requesting that you quote me, leaving me unable to edit it away.
    - Jack 21 Oct 2011 19:58 UTC
      4 points
      0
      Parent
      Yes, but now they see you coming.
    - D_Alex 24 Oct 2011 9:15 UTC
      −1 points
      0
      Parent
      You are treading on treacherous moral ground! Your “jerk” may be my best mate (OK, he’s a bit intense… but you are no angel either!). Your “suicidal fanatic” may be my hero. As for psychopaths, see this.
      
      Also, I can understand “I really don’t want the volition of ANYONE to be extrapolated in such a way as it could destroy all that I hold dear”—why pick on psychopaths, suicidal fanatics and jerks in particular?
      - wedrifid 24 Oct 2011 10:19 UTC
        5 points
        0
        Parent
        
        You are treading on treacherous moral ground! Your “jerk” may be my best mate (OK, he’s a bit intense… but you are no angel either!). Your “suicidal fanatic” may be my hero.
        
        If so then I don’t want your volition extrapolated either. Because that would destroy everything I hold dear as well (given the extent to which you would either care about their dystopic values yourself or care about them getting those same values achieved).
        
        Also, I can understand “I really don’t want the volition of ANYONE to be extrapolated in such a way as it could destroy all that I hold dear”
        
        I obviously would prefer an FAI to extrapolate only MY volition. Any other preference is a trivial reductio to absurdity. The reason to support the implementation of an FAI that extrapolates more generally is so that I can cooperate with other people whose preferences are not too much different to mine (and in some cases may even resolve to be identical). Cooperative alliances are best formed with people with compatible goals and not those whose success would directly sabotage your own.
        
        why pick on psychopaths, suicidal fanatics and jerks in particular?
        
        Do I need to write a post “Giving a few examples does not assert a full specification of a set”? I’m starting to feel the need to have such a post to link to pre-emptively.
        D_Alex 27 Oct 2011 8:07 UTC
        0 points
        0
        Parent
        
        I don’t want your volition extrapolated either.
        
        You are a jerk!
        
        . . . .
        
        See where this approach gets us?
        wedrifid 27 Oct 2011 9:53 UTC
        0 points
        0
        Parent
        
        See where this approach gets us?
        
        Not anywhere closer to understanding how altruism and morality apply to extrapolated volition for a start.
        
        Note that the conditions that apply to the quote but that are not included are rather significant. Approximately it is conditional on your volition being to help other agents do catastrophically bad things to the future light cone.
        
        What I am confident you do not understand is that excluding wannabe accomplices to Armageddon from the set of agents given to a CEV implementation does not even rule out (or even make unlikely) the resultant outcome taking into consideration all the preferences of those who are not safe to include (and just ignoring the obnoxiously toxic ones).
        D_Alex 31 Oct 2011 5:22 UTC
        0 points
        0
        Parent
        
        What I am confident you do not understand is that excluding wannabe accomplices to Armageddon from the set of agents given to a CEV implementation does not even rule out (or even make unlikely) the resultant outcome taking into consideration all the preferences of those who are not safe to include (and just ignoring the obnoxiously toxic ones).
        
        I barely understand this sentence. Do you mean: Excluding “jerks” from CEV does not guarantee that their destructive preferences will not be included?
        
        If so, I totally do not agree with you, as my opinion is: Including “jerks” in CEV will not pose a danger, and saves the trouble of determining who is a “jerk” in the first place.
        
        This is based on the observation that “jerks” are a minority, an opinion that “EV-jerks” are practically non-existent, and an understanding that where a direct conflict exist between EV of a minority and EV of a majority, it is the EV of a majority that will prevail in the CEV. If you disagree with any of these, please elaborate, but use a writing style that does not exceed the comprehension abilities of an M. Eng.
        wedrifid 31 Oct 2011 6:48 UTC
        1 point
        0
        Parent
        I hope you are right. But that is what it is, hope. I cannot know with any confidence that and Artificial Intelligence implementing CEV is Friendly. I cannot know if it will result in me and the people I care about continuing to live. It may result in something that, say, Robin Hanson considers desirable (and I would consider worse than simple extinction.)
        
        Declaring CEV to be optimal amounts to saying “I have faith that everyone is allright on the inside and we would all get along if we thought about it a bit more. Bullshit. That’s a great belief to have if you want to signal your personal ability to enforce cooperation in your social environment but not a belief that you want actual decision makers to have. Or, at least, not one you want them to simply assume without huge amounts of both theoretical and empirical research.
        
        (Here I should again refer you to the additional safeguards Eliezer proposed/speculated on for in case CEV results in Jerkiness. This is the benefit of being able to acknowledge that CEV isn’t good by definition. You can plan ahead just in case!)
        
        If you disagree with any of these, please elaborate, but use a writing style that does not exceed the comprehension abilities of an M. Eng.
        
        It is primarily a question of understanding (and being willing to understand) the content.
        
        This is based on the observation that “jerks” are a minority, an opinion that “EV-jerks” are practically non-existent
        
        You don’t know that. Particularly since EV is not currently sufficiently defined to make any absolute claims. EV doesn’t magically make people nice or especially cooperative unless you decide to hack in a “make nicer” component to the extrapolation routine.
        
        and an understanding that where a direct conflict exist between EV of a minority and EV of a majority, it is the EV of a majority that will prevail in the CEV
        
        You don’t know that either. The ‘coherence’ part of CEV is even less specified than the EV part. Majority rule is one way of resolving conflicts between competing agents. It isn’t the only one. But I don’t even know that AI> results in something I would consider Friendly. Again, there is a decent chance that it is not-completely-terrible but that isn’t something to count on without thorough research and isn’t an ideal to aspire to either. Just something that may need to be compromised down to.
        lessdazed 31 Oct 2011 7:52 UTC
        0 points
        0
        Parent
        
        The ‘coherence’ part of CEV is even less specified than the EV part.
        
        One possibility is the one inclined to shut down rather than do anything not neutral or better from every perspective. This system is pretty likely useless, but likely to be safe too, and not certainly useless. Variants allow some negatives, but I don’t know how one would draw a line—allowing everyone a veto and requiring negotiation with them would be pretty safe, but also nearly useless.
        
        EV doesn’t magically make people nice or especially cooperative
        
        I’m not sure exactly what you’re implying so I’ll state something you may or may not agree with. It seems likely it makes people more cooperative in some areas, and has unknown implications in other areas, so as to whether it makes them ultimately more or less cooperative, that is unknown. But the little we can see is of cooperation increasing, and it would be unreasonable to be greatly surprised in the event that were found to be the overwhelming net effect.
        
        But I don’t even know that AI> results in something I would consider Friendly.
        
        As most possible minds don’t care about humans, I object to using “unfriendly” to mean “an AI that would result in a world that I don’t value.” I think it better to use “unfriendly” to mean those minds indifferent to humans and the few hateful ones. Those that have value according to many but not all, such as perhaps those that seriously threaten to torture people, but only when they know those threatened will buckle, are better thought of as being a subspecies of Friendly AI.
        wedrifid 31 Oct 2011 13:09 UTC
        0 points
        0
        Parent
        
        As most possible minds don’t care about humans, I object to using “unfriendly” to mean “an AI that would result in a world that I don’t value.” I think it better to use “unfriendly” to mean those minds indifferent to humans and the few hateful ones. Those that have value according to many but not all, such as perhaps those that seriously threaten to torture people, but only when they know those threatened will buckle, are better thought of as being a subspecies of Friendly AI.
        
        I disagree. I will never refer to anything that wants to kill or torture me as friendly. Because that would be insane. AIs that are friendly to certain other people but not to me are instances of uFAIs in the same way that paperclippers are uFAIs (that are Friendly to paperclips). I incidentally also reject FAI and FAI. Although in the latter case I would still choose it as an alternative to nothing (which likely defaults to extinction).
        
        Mind you the nomenclature isn’t really sufficient to the task either way. I prefer to make my meaning clear of ambiguities. So if talking about “Friendly” AI that will kill me I tend to use the quotes that I just used while if I am talking about something that is Friendly to a specific group I’ll parameterize.
        Expand this thread
        lessdazed 31 Oct 2011 16:21 UTC
        0 points
        0
        Parent
        
        I will never refer to anything that wants to kill or torture me as friendly
        
        OK—this is included under what I would suggest to call “Friendly”, certainly if it only wanted to do so instrumentally, so we have a genuine disagreement. This is a good example for you to raise, as most even here might agree with how you put that.
        
        Nonetheless, my example is not included under this, so let’s be sure not to talk past each other. It was intended to be a moderate case, one in which you might not call something friendly when many others here would* - one in which a being wouldn’t desire to torture you, and would be bluffing if only in the sense that it had scrupulously avoided possible futures in which anyone would be tortured, if not in other senses (i.e. it actually would torture you, if you chose the way you won’t).
        
        As for not killing you, that sounds like an obviously badly phrased genie wish. As a similar point to the one you expressed would be reasonable and fully contrast with mine I’m surprised you added that.
        
        One can go either way (or other or both ways) on this labeling. I am apparently buying into the mind-projection fallacy and trying to use “Friendly” the way terms like “funny” or “wrong” are regularly used in English. If every human but me “finds something funny”, it’s often least confusing to say it’s “a funny thing that isn’t funny to me” or “something everyone else considers wrong that I don’t consider “wrong” (according to the simplest way of dividing concept-space) that is also advantageous for me”. You favor taking this new term and avoiding using the MPF, unlike for other English terms, and having it be understood that listeners are never to infer meaning as if the speaker was committing it, I favor just using it like any other term.
        
        So:
        
        Mind you the nomenclature isn’t really sufficient to the task either way
        
        My way, a being that wanted to do well by some humans and not others would be objectively both Friendly and Unfriendly, so that might be enough to make my usage inferior. But if my molecules are made out of usefulonium, and no one else’s are, I very much mind a being exploiting me for that, but wouldn’t mind other humans calling that being friendly when it uses the usefulonium to shield the Earth from a supernova, or whatever—and it’s not just not minding by comparison, either.
        
        *I mean both when others refer to beings making analogous threats to them and to the one that would make them to you.
        lessdazed 31 Oct 2011 7:25 UTC
        0 points
        0
        Parent
        
        Do you mean: Excluding “jerks” from CEV does not guarantee that their destructive preferences will not be included?
        
        If so, I totally do not agree with you
        
        Through me, my dog is included. All the more so mothers’ sons!
        
        an understanding that where a direct conflict exist between EV of a minority and EV of a majority, it is the EV of a majority that will prevail in the CEV.
        
        I don’t think this is true, the safeguard that’s safe is to shut down if a conflict exists. That way, either things are simply better or no worse; judging between cases when each case has some advantages over the other is tricky.
        lessdazed 26 Oct 2011 13:33 UTC
        0 points
        0
        Parent
        
        If so then I don’t want your volition extrapolated either. Because that would destroy everything I hold dear
        
        How? As is, psychopaths have some influence, and I don’t consider the world worthless. Whatever their slice of a much larger pie, how would that be a difference in kind, something other than a lost opportunity?
        wedrifid 26 Oct 2011 23:15 UTC
        4 points
        0
        Parent
        There is a reasonable good chance that when averaged out by the currently unspecified method used by the CEV process that any abominable volitions are offset by volitions that are at least vaguely acceptable. But that doesn’t mean including Jerks (where ‘Jerk’ is defined as agents whose extrapolated volitions are deprecated) in the process that determines the fate of the universe is The Right Thing To Do any more than including paperclippers, superhappies and babyeaters in the process is obviously The Right Thing To Do.
        
        CEV might turn out OK. Given the choice of setting loose a {Superintelligence Optimising CEV} or {Nothing At All nothing at all and we all go extinct} I’ll choose the former. There are also obvious political reasons why such a compromise might be necessary.
        
        If anyone thinks that CEV is not a worse thing to set loose than CEV then they are not being altruistic or moral they are being confused about a matter of fact.
        
        Disclaimer that is becoming almost mandatory in this kind of discussion: altruism, ethics and morality belong inside utility functions and volitions not in game theory or abstract optimisation processes.
        lessdazed 27 Oct 2011 0:00 UTC
        0 points
        0
        Parent
        
        But that doesn’t mean including Jerks (where ‘Jerk’ is defined as agents whose extrapolated volitions are deprecated) in the process that determines the fate of the universe is The Right Thing To Do
        
        Sure, inclusion is a thing that causes good and bad outcomes, and not necessarily net good outcomes.
        
        There are also obvious political reasons why such a compromise might be necessary.
        
        Sure, but it’s not logically necessary that it’s a compromise, though it might be. It might be that the good outweighs the bad, or not, I’m not sure from where I stand.
        
        If anyone thinks that CEV is not a worse thing to set loose than CEV then they are not being altruistic or moral they are being confused about a matter of fact.
        
        Because I value inclusiveness more than zero, that’s not necessarily true. It’s probably true, or better yet, if one includes the best of the obvious Jerks with the rest of humanity, it’s quite probably true. All else equal, I’d rather an individual be in than out, so if someone is all else equal worse than useless but only light ballast, having them is a net good.
        
        Disclaimer
        
        It’s Adam and Eve, not Adam and Vilfredo Pareto!
        wedrifid 27 Oct 2011 0:26 UTC
        0 points
        0
        Parent
        
        Disclaimer
        
        It’s Adam and Eve, not Adam and Vilfredo Pareto!
        
        Huh? Chewbacca?
        lessdazed 27 Oct 2011 0:59 UTC
        0 points
        0
        Parent
        I think your distinction is artificial, can you use it to show how an example question is a wrong question and another isn’t, and show how your distinction sorts among those two types well?
        wedrifid 27 Oct 2011 1:18 UTC
        0 points
        0
        Parent
        Your Adam and and Eve reply made absolutely no sense and this question makes only slightly more. I cannot relate what you are saying to the disclaimer that you partially quote (except one way that implies you don’t understand the subject matter—which I prefer not to assume). I cannot answer a question about what I am saying when I cannot see how on earth it is relevant.
        D_Alex 25 Oct 2011 3:41 UTC
        −1 points
        0
        Parent
        You missed my point 3 times out of 3. Wait, I’ll put down the flyswatter and pick up this hammer...:
        
        Excluding certain persons from CEV creates issues that CEV was intended to resolve in the first place. The mechanic you suggest—excluding persons that YOU deem to be unfit—might look attractive to you, but it will not be universally acceptable.
        
        Note that “our coherent extrapolated volition is our wish if we knew more, were smarter...” etc . The EVs of yourself and that suicidal fanatic should be pretty well aligned—you both probably value freedom, justice, friendship, security and like good food, sex and World of Warcraft(1)… you just don’t know why he believes that suicidal fanaticism is the right way under his circumstances, and he is, perhaps, not smart enough to see other options to strive for his values.
        
        Can I also ask you to re-read CEV, paying particular attention to Q4 and Q8 in the PAQ section? They deal with the instinctive discomfort of including everyone in the CEV.
        
        (1) that was a backhand with the flyswatter, which I grabbed with my left hand just then.
        wedrifid 25 Oct 2011 6:35 UTC
        8 points
        0
        Parent
        
        Note that “our coherent extrapolated volition is our wish if we knew more, were smarter...” etc . The EVs of yourself and that suicidal fanatic should be pretty well aligned—you both probably value freedom
        
        No. I will NOT assume that extrapolating the volition of people with vastly different preferences to me will magically make them compatible with mine. The universe is just not that convenient. Pretending it is while implementing a FAI is suicidally naive.
        
        Can I also ask you to re-read CEV, paying particular attention to Q4 and Q8 in the PAQ section? They deal with the instinctive discomfort of including everyone in the CEV.
        
        I’m familiar with the document, as well as approximately everything else said on the subject here, even in passing. This includes Eliezer propozing ad-hoc work arounds to the “What if people are jerks?” problem.
        D_Alex 25 Oct 2011 7:07 UTC
        −6 points
        0
        Parent
        
        No. I will NOT assume
        
        Quite right, don’t assume. Think it through. Then you may be less inclined to pepper your posts with non-sequiturs like “magically”, “pretending” and “naive”.
        
        I’m familiar with the document, as well as approximately everything else said on the subject here, even in passing.
        
        Great! But, IMHO, you have a tendency to miss the point. So:
        
        Can I also ask you to re-read CEV, paying particular attention to Q4 and Q8 in the PAQ section? They deal with the instinctive discomfort of including everyone in the CEV.
        
        lessdazed 26 Oct 2011 13:26 UTC
        0 points
        0
        Parent
        
        be pretty well aligned
        
        What do you mean? As an analogy, .01% sure and 99.99% sure are both states of uncertainty. EVs are exactly the same or they aren’t. If someone’s unmuddled EV is different than mine—and it will be—I am better off with mine influencing the future alone rather than the future being influenced by both of us, unless my EV sufficiently values that person’s participation.
        
        My current EV places some non-infinite value on each person’s participation. You can assume for the sake of argument each person’s EV would more greatly value this.
        
        You can correctly assume that for each person, all else equal, I’d rather have them than not, (though not necessarily at the cost of having the universe diverted from my wishes) but I don’t really see why the death of most of the single ring species that is everything alive today makes selecting humans alone for CEV the right thing to do in a way that avoids the problem of excluding the disenfranchised whom the creators don’t care sufficiently about.
        
        If enough humans value what other humans want, and more so when extrapolated, it’s an interlocking enough network to scoop up all humans but the biologist who spends all day with chimpanzees (dolphins, octopuses, dogs, whatever) is going to be a bit disappointed by the first-order exclusion of his or her friends from consideration.
        D_Alex 27 Oct 2011 7:51 UTC
        −4 points
        0
        Parent
        I mean, once they both take pains to understand each other’s situation and have a good, long think about it, they would find that they will agree on the big issues and be able to easily accommodate their differences. I even suspect that overall they would value the fact that certain differences exist.
        
        EVs can, of course, be exactly the same, or differ to some degree. But—provided we restrict ourselves to humans—the basic human needs and wants are really quite consistent across an overwhelming majority. There is enough material (on the web and in print) to support this.
        
        Wedrifid (IMO) is making a mistake of confusing some situation dependent subgoals (like say “obliterate Israel” or “my way or the highway”) with high level goals.
        
        I have not thought about extending CEV beyond human species, apart from taking into account the wishes of your example biologists etc. I suspect it would not work, because extrapolating wishes of “simpler” creatures would be impossible. See http://xkcd.com/605/.
        wedrifid 28 Oct 2011 18:31 UTC
        1 point
        0
        Parent
        
        Wedrifid (IMO) is making a mistake of confusing some situation dependent subgoals (like say “obliterate Israel” or “my way or the highway”) with high level goals.
        
        You are mistaken. That I entertain no such confusion should be overwhelmingly clear from reading nearby comments.
        TheOtherDave 27 Oct 2011 13:03 UTC
        1 point
        0
        Parent
        
        I have not thought about extending CEV beyond human species, apart from taking into account the wishes of your example biologists etc. I suspect it would not work, because extrapolating wishes of “simpler” creatures would be impossible.
        
        That sounds awfully convenient. If there really is a threshold of how “non-simple” a lifeform has to be to have coherently extrapolatable volitions, do you have any particular evidence that humans clear that threshold and, say, dolphins don’t?
        
        For my part, I suspect strongly that any technique that arrives reliably at anything that even remotely approximates CEV for a human can also be used reliably on many other species. I can’t imagine what that technique would be, though.
        
        (Just for clarity: that’s not to say one has to take other species’ volition into account, any more than one has to take other individuals’ volition into account.)
        D_Alex 31 Oct 2011 5:37 UTC
        2 points
        0
        Parent
        The lack of threshold is exactly the issue. If you include dolphins and chimpanzees, explicitly, you’d be in a position to apply the same reasoning to include parrots and dogs, then rodents and octopi, etc, etc.
        
        Eventually you’ll slide far enough down this slippery slope to reach caterpillars and parasitic wasps. Now, what would a wasp want to do, if it understood how its acts affect the other creatures worthy of inclusion in the CEV?
        
        This is what I see as the difficulty in extrapolating the wishes of simpler creatures. Perhaps in fact there is a coherent solution, but having only thought about this a little, I suspect there might not be one.
        lessdazed 31 Oct 2011 7:19 UTC
        1 point
        0
        Parent
        
        lack of threshold...then rodents...parasitic wasps
        
        We don’t have to care. If everyone or nearly all were convinced that something less than 20 pounds had no moral value, or a person less than 40 days old, or whatever, that would be that.
        
        Also, as some infinite sums have finite limits, I do not think that small things necessarily make summing humans’ or the Earth’s morality impossible.
        TheOtherDave 31 Oct 2011 18:09 UTC
        0 points
        0
        Parent
        Ah, OK. Sure, if your concern is that, if we extrapolated the volition of such creatures, we would find that they don’t cohere, I’m with you. I have similar concerns about humans, actually.
        
        I’d thought you were saying that we’d be unable to extrapolate it in the first place, which is a different problem.
        pedanterrific 25 Oct 2011 3:48 UTC
        −2 points
        0
        Parent
        
        Can I also ask you to re-read CEV, paying particular attention to Q4 and Q8 in the PAQ section?
        
        Just, uh… just making sure: you do know that wedrifid has more fourteen thousand karma for a reason, right? It’s actually not solely because he’s an oldtimer, he can be counted on to have thought about this stuff pretty thoroughly.
        
        Edit: I’m not saying “defer to him because he has high status”, I’m saying “this is strong evidence that he is not an idiot.”
        D_Alex 25 Oct 2011 4:13 UTC
        0 points
        0
        Parent
        I admit to being a little embarrassed as I wrote that paragraph, because this sort of thing can come across as “fuck you”. Not my intent at all, just that the reference is relevant, well written, supports my point—and is too long to quote.
        
        Having said that, your comment is pretty stupid. Yes, he has heaps more karma here—so what? I have more karma here than R. Dawkins and B. Obama combined!
        pedanterrific 25 Oct 2011 4:22 UTC
        6 points
        0
        Parent
        (I prefer “Godspeed!”)
        
        The “so what” is, he’s already read it. Also, he’s, you know, smart. A bit abrasive (or more than a bit), but still. He’s not going to go “You know, you’re right! I never thought about it that way, what a fool I’ve been!”
        
        Edit: Discussed here.
        wedrifid 25 Oct 2011 6:19 UTC
        1 point
        0
        Parent
        
        A bit of an ethical egoist (or more than a bit), but still.
        
        I suppose “ethical egoism” fits. But only in some completely subverted “inclusive ethical egoist” sense in which my own “self interest” already takes into account all my altruistic moral and ethical values. ie. I’m basically not an ethical egoist at all. I just put my ethics inside the utility function where they belong.
        pedanterrific 25 Oct 2011 6:35 UTC
        2 points
        0
        Parent
        Duly noted! (I apologize for misconstruing you, also.)
- lessdazed 22 Oct 2011 7:37 UTC
  0 points
  0
  Parent
  
  “better for everyone”
  
  I’m not sure this can mean one thing that is also important.
  - wedrifid 22 Oct 2011 9:11 UTC
    1 point
    0
    Parent
    
    “better for everyone”
    
    I’m not sure this can mean one thing that is also important.
    
    Huh? Yes it can. It means “results in something closer to CEV than the alternative does”, which is pretty damn important given that it is exactly what the context was talking about.
    - lessdazed 22 Oct 2011 13:00 UTC
      0 points
      0
      Parent
      
      that it is exactly what the context was talking about.
      
      I agree that context alone pointed to that interpretation, but as that makes your statement a tautology, I thought it more likely than not you were referencing a more general meaning than the one under discussion. This was particularly so because of the connotations of “wary”, i.e. “this sort of argument tends to seem more persuasive than it should, but the outside view doesn’t rule them out entirely,” rather than “arguments of this form are always wrong because they are logically inconsistent”.
      - wedrifid 22 Oct 2011 13:48 UTC
        2 points
        0
        Parent
        Because Phlebas’s argument is not, in fact, tautologically false and is merely blatantly false I chose to refrain from a (false) accusation of inconsistency.
        [deleted] 22 Oct 2011 19:10 UTC
        0 points
        0
        Parent
        Here is the post that you linked to, in which you ostensibly prove that an excerpt of my essay was “blatantly false”:
        
        Phlebas:
        
        In other words, the CEV initial dynamic shouldn’t be regarded as discovering what a group of people most desire collectively “by definition”—it is imperfect. If a universal CEV implementation is more difficult for human programmers to do well than a selective CEV, then a selective CEV might not only extrapolate the desires of the group in question more accurately, but also do a better job of reflecting the most effectively extrapolated desires of humanity as a whole.
        
        wedrifid:
        
        I am wary of using arguments along the lines of “CEV is better for everyone than CEV”. If calculating based on a subset happens to be the most practical instrumentally useful hack for implementing CEV then an even remotely competent AI can figure that out itself.
        
        Note that I have made no particular claim about how likely it is that the selective CEV will closer to the ideal CEV of humanity than the universal CEV. I merely claimed that it is not what they most desire collectively “by definition”, i.e. it is not logically necessary that it approximates the ideal human-wide CEV (such as a superintelligence might develop) better than the selective CEV.
        
        [Here] is a comment claiming that CEV most accurately identifies a group’s average desires “by definition” (assuming he doesn’t edit it). So it is not a strawman position that I am criticising in that excerpt.
        
        You argue that even given a suboptimal initial dynamic, the superintelligent AI “can” figure out for a better dynamic and implement that instead. Well of course it “can” – nowhere have I denied that the universal CEV might (with strong likelihood in fact) ultimately produce at least as close an approximation to the ideal CEV of humaity as a selective CEV would.
        
        Nonetheless, high probability =/= logical necessity. Therefore you may wish to revisit your accusation of blatant fallacy. If you are going to use insults, please back them up with a detailed, watertight argument.
        
        How probable exactly is an interesting question, but I shan’t discuss that in this comment since I don’t wish to muddy the waters regarding the nature of the original statement that you were criticising.
        [deleted] 22 Oct 2011 19:17 UTC
        −2 points
        0
        Parent
        Here is the post that you linked to, in which you ostensibly prove that an excerpt of my essay was blatantly false:
        
        Phlebas:
        
        In other words, the CEV initial dynamic shouldn’t be regarded as discovering what a group of people most desire collectively “by definition”—it is imperfect. If a universal CEV implementation is more difficult for human programmers to do well than a selective CEV, then a selective CEV might not only extrapolate the desires of the group in question more accurately, but also do a better job of reflecting the most effectively extrapolated desires of humanity as a whole.
        
        wedrifid:
        
        I am wary of using arguments along the lines of “CEV is better for everyone than CEV”. If calculating based on a subset happens to be the most practical instrumentally useful hack for implementing CEV then an even remotely competent AI can figure that out itself.
        
        Note that I have made no particular claim in this excerpt about how likely it is that a selective CEV would produce output closer to that of an ideal universal CEV dynamic than a universal CEV would. I merely claimed that a universal CEV dynamic designed by humans is not what humans most desire collectively “by definition”, i.e. it is not logically necessary that it approximates the ideal human-wide CEV (such as a superintelligence might develop) better than the selective CEV.
        
        Here is a comment claiming that CEV most accurately identifies a group’s average desires “by definition” (assuming he doesn’t edit it). So it is not a strawman position that I am criticising in that excerpt.
        
        You argue that even given a suboptimal initial dynamic, the superintelligent AI “can” figure out a better dynamic and implement that instead. Well of course it “can” – nowhere have I denied that the universal CEV might (with strong likelihood in fact) ultimately produce at least as close an approximation to the ideal CEV of humanity as a selective CEV would.
        
        Nonetheless, high probability =/= logical necessity. Therefore you may wish to revisit your accusation of blatant fallacy.
        
        How probable exactly is an interesting question, but best left alone in this comment since I don’t wish to muddy the waters regarding the nature of the original statement that you were criticising.
        wedrifid 23 Oct 2011 8:04 UTC
        −4 points
        0
        Parent
        [deleted] 23 Oct 2011 9:19 UTC
        −2 points
        0
        Parent
        The point being that actually, it is worthwhile to point out simply that it is not a logical necessity—because people actually believe that. Once that is accepted, it clears the way for discussion of the actual probability that the AI does such a good job.
        
        Therefore there is not one thing wrong with the excerpt that you quoted (and if you have a problem with another part of the essay, you should at least point out where the fallacy is).
        [deleted] 23 Oct 2011 10:05 UTC
        −3 points
        0
        Parent
        To address the question of the likelihood of the AI patching things up itself:
        
        How much trust do we put in human programmers? In one instance, they would have to create a dynamic that can apply transformations to Nobel laureates; in the other, they must create a dynamic that can apply transformations to a massive number of mutually antagonistic, primitive, low-IQ and superstitious minds.
        
        Furthermore, although speculation about the details of the implementation becomes necessary, using a small group of minds the programmers could learn about these minds in vast detail, specifically identifying any particular problems and conducting tests and trials, whereas with 7 billion or more minds this is impossible.
        
        The initial dynamic is supposed to be capable of generating an improved dynamic. On the other hand, there are certain things the AI can’t help with. The AI does have vast knowledge of its own, but the programmers have specified the way in which the AI is to “increase knowledge” and so forth of the humans in the first place. This is the distinction wedrifid seems to have missed. If this specification is lousy in the first place, then the output that the AI extracts from extrapolating the volition of humanity might be some way off the mark, in comparison to the ouput if “increasing knowledge” etc. was done in an ideal fashion.
        
        The AI may then go on to implement a new CEV dynamic – but this might be a lousy equilibrium generated by an original poor implementation of transforming the volition of humanity, and this poor reflection of human volition is down to the abilities of the human programmers.
        
        On the other hand, it might take a suboptimal initial dynamic (with suboptimal specifications of “increase knowledge”, “grow up closer together etc.), and manage to locate the ideal dynamic. What I dispute is that this is “blatantly” obvious. That is (motivated) overconfidence regarding a scenario that is purely theoretical, and very vague at this point.
        
        And I certainly dispute that it is necessary “by definition”, which is all I actually claimed in my essay!!
        
        In other words, a superintelligence is not immune to GIGO. Getting an output of some kind from the CEV does not guarantee that the superintelligence has circumvented this problem.
        
        Edit: He has disappeared. How is this for a rational quote:
        
        “This is always the tactic of the denialist: lie and run. He never stays for a fight; he never admits error, even the most glaring; his goal is to pack the maximum insult into the minimum number of words.”
wedrifid 21 Oct 2011 19:43 UTC
8 points
0

Many other groups of minds, for example the FAI programming team themselves, would of course seem too arbitrary to gather sufficient support for the idea.

Depends what you mean by “sufficient support”. The only sufficient support that particularly subgroup needs is stealth.
- Eliezer Yudkowsky 22 Oct 2011 17:39 UTC
  9 points
  0
  Parent
  I wouldn’t go along with it. Marcello wouldn’t go along with it. Jaan Tallinn and Peter Thiel might or might not fund it, but probably not. I’m not saying this couldn’t exist, just that it would have neither the funding nor the people that it currently does.
  - wedrifid 23 Oct 2011 5:47 UTC
    7 points
    0
    Parent
    Unfortunately you don’t currently have either the funding or the people to create an FAI of any kind, selfish or otherwise.
- TheOtherDave 21 Oct 2011 19:52 UTC
  3 points
  0
  Parent
  Or speed.
  - MatthewBaker 23 Oct 2011 12:41 UTC
    0 points
    0
    Parent
    I may be generalizing from fictional evidence here, but isn’t this exactly what Prime Intellect writes to instill in us? (I still don’t know why it was a problem to restart the aliens if you kept them in a similar universe to ours caught in a small enough simulation to monitor.)
  - Jack 21 Oct 2011 19:55 UTC
    0 points
    0
    Parent
    Or force.
    - wedrifid 21 Oct 2011 20:03 UTC
      3 points
      0
      Parent
      Or ostracism, disrespect and mockery.
      - lessdazed 22 Oct 2011 3:50 UTC
        13 points
        0
        Parent
        Our chief weapon is ostracism. Ostracism and disrespect, disrespect and ostracism...our two weapons are ostracism and disrespect...and mockery. Our three weapons...
    - Jayson_Virissimo 22 Oct 2011 9:37 UTC
      0 points
      0
      Parent
      Or luck.
    - pedanterrific 22 Oct 2011 4:25 UTC
      0 points
      0
      Parent
      Or power, or momentum, or mass...
nazgulnarsil 23 Oct 2011 1:14 UTC
7 points
0
I favor a diaspora cev. Why compromise between wildly divergent CEV’s of subsets if you don’t actually have to? In more concrete terms, I’m in favor of holodecking psychopaths.
Vladimir_Nesov 21 Oct 2011 20:53 UTC
7 points
0

veil of ignorance

The idea is that you must set up a mechanism that lets the AI itself to draw good specific judgment, so that if you find yourself needing to rely on your own, it might indicate that you failed that necessary requirement, and you need to go back to the drawing board.
wedrifid 21 Oct 2011 19:51 UTC
7 points
0

Furthermore, desirability of the CEV ouput to the average human in existence today should be weighed against the desires of (for example) sentient human uploads created in a post-singularity scenario.

No, it really shouldn’t. CEV is inclusive. That is, if we care about post human sentient uploads then CEV accounts for that better than we can. That’s the whole point. (This holds unless ‘desirability’ is defined to mean some other arbitrary thing independent of volition of the type we are talking about.)
- [deleted] 21 Oct 2011 20:43 UTC
  4 points
  0
  Parent
  That depends who “we” is referring to. What I meant to say is that if the FAI programmers and other intellectuals believe that extrapolated humans in general might not care about harming uploads (future humans with no voice in that CEV) - whereas a selective CEV of intellectuals is expected to do so—then they should consider this when deciding whether to allow all of humanity to have a say in the CEV, rather than a subset of minds whom they consider to be safer in that regard.
  
  So even if the implementation of a universal CEV as initial dynamic can be expected to reflect the desires of humanity en masse better than a selective initial CEV, this doesn’t define the total moral space that should be of concern to those responsible for and having the power to influence that implementation.
Vladimir_Nesov 21 Oct 2011 20:10 UTC
6 points
0

A prerequisite for the creation of superintelligent AI must surely be the acquisition of detailed knowledge of the workings of the human brain.

Surely not “surely”.
- [deleted] 21 Oct 2011 20:15 UTC
  −2 points
  0
  Parent
  If not, then the particular point I was making is strengthened.
  - Vladimir_Nesov 21 Oct 2011 20:58 UTC
    3 points
    0
    Parent
    I care not which position a flaw supports, and this one seems like grievous overconfidence.
    - [deleted] 21 Oct 2011 21:06 UTC
      −2 points
      0
      Parent
      Out of interest, can you give a rough idea of your probability estimate that a functioning superintelligent AI can be created in a reasonable time-scale without our having first gained a detailed understanding of the human brain—i.e. that an superintelligence is built without the designers reverse-engineering an existing intelligence to any significant extent?
      
      Edit: because there is nothing rational about interpreting words like “surely” literally when they are obviously being used in a casual or innocently rhetorical way.
      What links here?
      lessdazed's comment on In favour of a selective CEV initial dynamic by [deleted] (22 Oct 2011 7:34 UTC; 5 points)
      - Dorikka 22 Oct 2011 20:14 UTC
        1 point
        0
        Parent
        
        because there is nothing rational about interpreting words like “surely” literally when they are obviously being used in a casual or innocently rhetorical way.
        
        You and Nesov either did not interpret your use of ‘surely’ (in context) to mean the same thing, or Nesov thought that additional clarification was needed (a statement which you do not seem to agree with). I’m failing to parse your use of the word rational in this context.
        
        Intention: Helpful information. I may not respond to a reply.
        [deleted] 22 Oct 2011 21:28 UTC
        1 point
        0
        Parent
        If Nesov thought that additional clarification was necessary, he could have said so. But actually he simply criticised the use of the word “surely”.
        
        I consider pedantry to be a good thing. On the other hand, it is at least polite to be charitable in interpreting someone, particularly when the nitpick in question is basically irrelevant to main thrust of the argument.
        
        “Surely” is just a word. Literally it means 100% or ~100% probability, but sometimes it just sounds good or it is used sloppily. If I had to give a number, I’d have said 95% probability that superintelligent AI won’t be developed before we learn about the human brain in detail.
        
        I’m highly amenable to criticism of that estimate from people who know more about the subject, but since my politely phrased request for Nesov’s own estimate was downvoted I decided that this kind of uncharitableness has more to do with status than constructive debate. As such it is not rational.
        
        I retracted the comment on the basis that it was a little petulant but since you asked, there is my explanation.
TrE 21 Oct 2011 18:48 UTC
6 points
0
This is a topic I have recently thought a bit about, although by no means as much as you. I largely agree with your post although I’m not quite sure that nobel laureates (excluding peace) are such a better choice than anything else. The Nobel Prize is not awarded for being a noble person, after all. You don’t have to wear a halo. Since selection effects other than intellect might play a significant role here, I wouldn’t use this group. A random sample of humankind might be better in that regard. I don’t know.
- [deleted] 21 Oct 2011 19:03 UTC
  3 points
  0
  Parent
  My main reason for mentioning Nobel laureates is that it jumped out to me as an obvious group of people who might be suitable, the first time the thought entered my head that the CEV initial dynamic might be selective. Insofar as I am similar to other people, this suggests to me that said group is a Schelling point, which means that it wouldn’t seem like a completely arbitrary choice of group (therefore leading to intractable disagreement about the choice).
  
  The Nobel laureates aren’t angels. Still less so is the average human being an angel. The perfect is the enemy of the good. Your point about unwanted selection effects on the other hand is to be taken seriously.
lessdazed 22 Oct 2011 7:34 UTC
5 points
0
You did a good job with this post and in dealing with the difficult original topic, a topic others may have been shying away from because of its difficulty (that’s my guess) - in any case, it shows up here less often than I would expect, for whatever reasons.

Unpacking the concept of “difficult”, it seems your writings never suffer from specific defects caused by impatience reading, thinking, or writing; clicking “comment” before finishing, transitioning from reading to writing before understanding, that sort of thing.

the thing might just shut itself down when we run it

This is information. It would be more useful than you imply.

The small group size of living scientific Nobel Prize winners (or any other likely subset of humans) poses certain problems for a selective CEV that the universal CEV lacks. For example, they might all come under the influence of a single person or ideology that is not conducive to the needs of wider humanity.

I’m not sure which of two meanings you intended, so I will just explain my view (that I think you probably intended) directly, in a way that pedantically keeps levels separate:

No one cares about the properties of ideologies of humans extrapolated from any more than they care about the color of those humans’ clothes, people just care about the properties of the end product. If the product is the same whether the clothes are red or blue, and the same whether humans’ ideologies are parochially good or universally good or universally bad, then those things are irrelevant.

Shutting up and multiplying demands that

Speaking of the way, if you speak overmuch of the way, you will speak of the way, and speak of the way, and speak of the way, and still be speaking of the way, and continue speaking of the way, still speaking of the way, speaking and speaking of the way, your schedule will be full, full, full of it, of speaking of it, speaking, that is, of the way, that is, speaking of it, verbally, aloud, written, typed, in cursive, in curses, when cursing, speaking overmuch of the way, the way, the way, the way and the way, also, the way, and as for how to speak according to the way...that’s another subject entirely.

It is put better as the last thing here but it would be a ludicrous sin to quote from that section directly. In any case that document’s not scripture, a mere four of the twelve have 95% of the value there, or more if one considers the others detract from those four. Or perhaps I only possess four of twelve equal virtues and am telling myself only four matter to make myself feel strong. Regardless, of them, the last of the twelve is the most important.

You say in a comment on this page: “there is nothing rational about interpreting words like “surely” literally when they are obviously being used in a casual or innocently rhetorical way”.

Stop doing that!
What links here?
- lessdazed's comment on Rationality Quotes October 2011 by MinibearRex (28 Oct 2011 23:53 UTC; 3 points)
- [deleted] 22 Oct 2011 11:17 UTC
  0 points
  0
  Parent
  
  You did a good job with this post
  
  Thanks!
  
  No one cares about the properties of ideologies of humans extrapolated from any more than they care about the color of those humans’ clothes, people just care about the properties of the end product. If the product is the same whether the clothes are red or blue, and the same whether humans’ ideologies are parochially good or universally good or universally bad, then those things are irrelevant.
  
  That is certainly true (the fact that only the output of the dynamic is overwhelmingly important is what I was getting at with those excessive mentions of “shut up and multiply”). But if the implementation of the initial dynamic is less than ideal, then there may not be perfect independence between the output and any strong ideological strain or value that someone managed to implant in a relatively small group of people.
  
  Knowing more, thinking faster and growing up closer together may very well render such a problem completely irrelevant, but that doesn’t constitute a reason to assume that these specifications will do so given their implementation by human programmers. This is a point in favour of the universal CEV, but similar considerations apply in favour of the selective CEV (as I attempted to show in my essay).
  
  Stop doing that!
  
  I’ve retracted that comment. I suppose that the careful use of probabilistic terms is important in general.
Armok_GoB 18 Nov 2011 23:18 UTC
4 points
0
I just got struck by an idea that seems to obvious, to naive, to possibly be true, and which horrified me causing my brain to throw a huge batch or rationalizations at it to stop me from believing something as obviously low status. I’m currently very undecided, but sich it seems like the thing I can’t handle on my own I’ll just leave a transcript of my uncensored internal monologue here:
- What volition do I want to extrapolate
- MY one, tautologically
- But Eliezer, the great leader who is way, way way smarter than you said you shouldn’t and thinking that was evil!
- He also said you shouldn’t just trust him like a great leader and reason from authority like that.
- But you want to maximize the CEV of humanity!
- In that case, it doesn’t matter which is used because they are identical. And I might be wrong about them being identical, in wich case I want my real preferences used, and while concluding that with THIS brain would not be safe, for a CEV superintelligence it would.
- Others will try to stop you!
- UDT. their volitions would be taken into account in proportion to their power to stop me, such that they would using this same reasoning would be better of helping me.
- If everyone does that you’ll be worse of!
- Again, UDT. Whatever CEV gets implemented in the end will take into account all arguments of this sort and modify itself into whatever I SHOULD have made in the first place.
- You are an evil bad poopyhead!
- If I am evil, I want to believe I am evil, and if I am nice, I want to believe I am nice. Or maybe I just want to believe I’m nice regardless but have the AI implement my evil preferences anyway.
- Please note that I do not endorse my every though, and probably will regret posting this in the morning. As you can see, I’m to tired to even correct this obvious contradiction in my beliefs, and to tired to care I know that I believe every statement is true because I believe I believe a contradiction and I believe contradictions imply all statements being true. Or spelling properly.
- TheOtherDave 19 Nov 2011 0:37 UTC
  2 points
  0
  Parent
  Leaving all the in-group/out-group anxiety aside, and assuming I were actually in a position where I get to choose whose volition to extrapolate, there’s three options: …humanity’s extrapolated volition is inconsistent with mine (in which case I get less of what I want by using humanity’s judgement rather than my own),
  ...HEV is consistent with, but different from, mine (in which case I get everything I want either way), or
  ...HEV is identical to mine (in which case I get everything I want either way).
  
  So HEV ⇐ mine.
  
  That said, others more reliably get more of what they want using HEV than using mine, which potentially makes it easier to obtain their cooperation if they think I’m going to use HEV. So I should convince them of that.
  - Armok_GoB 19 Nov 2011 18:44 UTC
    0 points
    0
    Parent
    But they’d prefer just the CEV of you two to the one of all humanity, and the same goes for each single human who’d raise that objection. The end result is the CEV of you+everyone hat could have stopped you. And this dosn’t need handling before you make it either: I’m pretty sure it arises naturally from TDT if you implement your own and were only able to do so because you used this argument on a bunch of people.
Multipartite 22 Oct 2011 12:17 UTC
4 points
0
I unfortunately lack time at the moment; rather than write a badly-thought-out response to the complete structure of reasoning considered, I will for the moment write fully-thought-out thoughts on minor parts thereof that my (?) mind/curiosity has seized on.

‘As for “taking over the world by proxy”, again SUAM applies.‘: this sentence stands out, but glancing upwards and downwards does not immediately reveal what SUAM refers to. Ctrl+F and looking at all appearances of the term SUAM on the page does not reveal what SUAM refers to. The first page of Google results for ‘SUAM’ does not reveal what SUAM refers to.

Hopefully SUAM is a reference to an S U A M acronym used elsewhere in the article or in a different well-known article, but a suggestion may be helpful that if the first then S U A M (SUAM) would be convenient in terms of phrase->acronym, and if the second then a reference to the location or else the expanded form of the acronym would be convenient.

The diamond case: Even if I did want a diamond, I simulate that I would feel nervous, alarmed even, if I indicated that I wanted it to bring me one box and I was brought a different box instead. I’m reminded—though this is not directly relevant—of Google searches, where I on occasion look up a rare word I’m unfamiliar with, and instead am given a page of results for a different (more common) word, with a question at the top asking me if I instead want to search for the word I searched for.

For Google, I would be much less frustrated if it always gave me the results I asked for, and maybe asked if I wanted to search for something else. (That way, when I do misspell something, I’m rightfully annoyed at myself and rightfully pleased with the search engine’s consistent behaviour.) For the diamond case, I would be happy if it for instance noticed that I wanted the diamond and alerted me to its actual location, giving me a chance to change my official decision.

Otherwise, I would be quite worried about it making other such decisions without my official consent, such as “Hmm, you say you want to learn about these interesting branches of physics, but I can tell that you say that because you anticipate doing so will make you happy, so I’ll ignore your request and pump your brain full of drugs instead forever.”. Even if in most cases the outcome is acceptable, for something to second-guess your desires at all means there’s always the possibility of irrevocably going against your will.

People may worry that a life of getting whatever one wants(/asks for) may not be ideal, but I’m reminded of the immortality/bat argument in that a person who gets whatever that person wants would probably not want to give that up for the sake of the benefits that would arguably come with not having those advantages.

In a more general sense, given that I already possess priorities and want them to be fulfilled (and know how I want to fulfill them), I would appreciate an entity helping me to do so, but would not want an entity to fulfill priorities that I don’t hold or try to fulfill them in ways which conflict with my chosen methods of fulfilling them. If creating something that would act according to what one woul want if one /were/ more intelligent or more moral or more altruistic, then A) that would only be desirable if one were such a person currently instead of being the current self, or B) that would be a good upgraded-replacement-self to let loose on the universe while oneself ceasing to exist without seeking to have one’s own will be done (other than on that matter of self-replacement).
- Stuart_Armstrong 22 Oct 2011 12:35 UTC
  5 points
  0
  Parent
  SUAM = shut up and multiply
  - Multipartite 22 Oct 2011 19:56 UTC
    2 points
    0
    Parent
    Ahh. Thank you! I was then very likely at fault on that point, being familiar with the phrase yet not recognising the acronym.
- [deleted] 22 Oct 2011 22:50 UTC
  1 point
  0
  Parent
  
  The diamond case: Even if I did want a diamond, I simulate that I would feel nervous, alarmed even, if I indicated that I wanted it to bring me one box and I was brought a different box instead.
  
  My brief recapitulation of Yudkowsky’s diamond example (which you can read in full in his CEV document) probably misled you a little bit. I expect that you would find Yudkowsky’s more thorough exposition of “extrapolating volition” somewhat more persuasive. He also warns about the obvious moral hazard involved in mere humans claiming to have extrapolated someone else’s volition out to significant distances – it would be quite proper for you to be alarmed about that!
  
  If creating something that would act according to what one would want if one /were/ more intelligent or more moral or more altruistic, then A) that would only be desirable if one were such a person currently instead of being the current self, or B) that would be a good upgraded-replacement-self to let loose on the universe while oneself ceasing to exist without seeking to have one’s own will be done (other than on that matter of self-replacement).
  
  Taken to the extreme this belief would imply that every time you gain some knowledge, improve your logical abilities or are exposed to new memes, you are changed into a different person. I’m sure you don’t believe that – this is where the concept of “distance” comes into play: extrapolating to short distance (as in the diamond example) allows you to feel that the extrapolated version of yourself is still you, but medium or long distance extrapolation might cause you to see the extrapolated self as alien.
  
  It seems to me that whether a given extrapolation of you is still “you” is just a matter of definition. As such it is orthogonal to the question of the choice of CEV as an AI Friendliness proposal. If we accept that an FAI must take as input multiple human value sets in order for it to be safe – I think that Yudkowsky is very persuasive on this point in the sequences – then there has to be a way of getting useful output from those value sets. Since our existing value computations are inconsistent in themselves, let alone with each other the AI has to perform some kind of transformations to cohere a useful signal from this input – this screens off any question of whether we’d be happy to run with our existing values (although I’d certainly choose the extrapolated volition in any case). “Knowing more”, “thinking faster”, “growing up closer together” and so on seem like the optimal transformations for it to perform. Short-distance extrapolations are unlikely to get the job done, therefore medium or long-distance extrapolations are simply necessary, whatever your opinion on the selfhood question.
  
  Eliezer says: “If our extrapolated volitions say we don’t want our extrapolated volitions manifested, the system replaces itself with something else we want, or vanishes in a puff of smoke.” A possible cause of such an output might be the selfhood concern that you have raised.
  - Multipartite 23 Oct 2011 19:25 UTC
    0 points
    0
    Parent
    Diamond: Ahh. I note that looking at the equivalent diamond section, ‘advise Fred to ask for box B instead’ (hopefully including the explanation of one’s knowledge of the presence of the desired diamond) is a notably potentially-helpful action, compared to the other listed options which can be variably undesirable.
    
    Varying priorities: That I change over time is an accepted aspect of existence. There is uncertainty, granted; on the one hand I don’t want to make decisions that a later self would be unable to reverse and might disapprove of, but on the other hand I am willing to sacrifice the happiness of a hypothetical future self for the happiness of my current self (and different hypothetical future selves)… hm, I should read more before I write more, as otherwise redundancy is likely. (Given that my priorities could shift in various ways, one might argue that I would prefer something to act on what I currently definitely want, rather than on what I might or might not want in the future (yet definitely do not want (/want not to be done) /now/). An issue of possible oppression of the existing for the sake of the non-existant… hm.)
    
    To check, does ‘in order for it to be safe’ refer to ‘safe from the perspectives of multiple humans’, compared to ‘safe from the perspective of the value-set source/s’? If so, possibly tautologous. If not, then I likely should investigate the point in question shortly.
    
    Another example that comes to mind regarding a conflict of priorities: ‘If your brain was this much more advanced, you would find this particular type of art the most sublime thing you’d ever witnessed, and would want to fill your harddrive with its genre. I have thus done so, even though to you who owns the harddrive and can’t appreciate it it consists of uninteresting squiggles, and has overwritten all the books and video files that you were lovingly storing.’
    
    Digression: If such an entity acts according to a smarter-me’s will, then theoretically existing does the smarter-me necessarily ‘exist’ as simulated/interpreted by the entity? Put another way, for a chatterbot to accurately create the exact interactions/responses that a sapient entity would, is it theoretically necessary for a sapient entity to effectively exist, simulated by the non-sapient entity, or could such an entity mimic a sapient entity withou sapience entering into the matter? (Would then a mimicked-sapient entity exist in a meaningful sense, but only if there were sapient entities hearing its words and benefiting from its willed actions, compared to if there were only multple mimicked-entities talking to each other? Hrm.) | If a smarter-me was necessarily simulated in a certain sense in order to carry out its will, I might be willing to accede to it in the same spirit as to extremely-intelligent aliens/robots wanting to wipe out humanity for their own reasons, but I would be unwilling to accept things which are against my interests being carried out for the interests of an entity which does not in fact in any sense exist.
    
    Manifestation: It occurs to me that a sandbox version could be interesting to oberve, one’s non-extrapolated volition wanting our extrapolated volitions to be modelled in simulated world-section level 2, and as a result of such a contradiction instead the extrapolated volitions of those in level 2 /not/ being modelled in level 3, yet still being modelled in level 2… again, though, while such a tool might be extremely useful for second-guessing one’s decisions and discussing with one very, very good reasons to rethink them (and thus in fact oneself changing hopefully-beneficially as a person (?) where applicable), something which directly defies one’s will(/one’s curiosity) lacks appeal as a goal (/stepping stone) to work towards.
    - [deleted] 23 Oct 2011 21:34 UTC
      4 points
      0
      Parent
      
      To check, does ‘in order for it to be safe’ refer to ‘safe from the perspectives of multiple humans’, compared to ‘safe from the perspective of the value-set source/s’? If so, possibly tautologous. If not, then I likely should investigate the point in question shortly.
      
      Both. I meant, in order for the AI not to (very probably) paperclip us.
      
      Another example that comes to mind regarding a conflict of priorities: ‘If your brain was this much more advanced, you would find this particular type of art the most sublime thing you’d ever witnessed, and would want to fill your harddrive with its genre. I have thus done so, even though to you who owns the harddrive and can’t appreciate it it consists of uninteresting squiggles, and has overwritten all the books and video files that you were lovingly storing.’
      
      Our (or someone else’s) volitions are extrapolated in the initial dynamic. The output of this CEV may recommend that we ourselves are actually transformed in this or that way. However, extrapolating volition does not imply that the output is not for our own benefit!
      
      Speaking in a very loose sense for the sake of clarity: “If you were smarter, looking at the real world from the outside what actions would you want taking in the real world?” is the essential question – and the real world is one in which the humans that exist are not themselves coherently-extrapolated beings. The question is not “If a smarter you existed in the real world, what actions would it want taking in the real world?”
      
      See the difference?
      
      Digression: If such an entity acts according to a smarter-me’s will, then theoretically existing does the smarter-me necessarily ‘exist’ as simulated/interpreted by the entity?
      
      Hopefully the AI’s simulations of people are not sentient! It may be necessary for the AI to reduce the accuracy of its computations, in order to ensure that this is not the case.
      
      Again, Eliezer discusses this in the document on CEV which I would encourage you to read if you are interested in the subject.
      - Multipartite 24 Oct 2011 15:51 UTC
        1 point
        0
        Parent
        CEV document: I have at this point somewhat looked at it, but indeed I should ideally find time to read through it and think through it more thoroughly. I am aware that the sorts of questions I think of have very likely already been thought of by those who have spent many more hours thinking about the subject than I have, and am grateful that the time has been taken to answer ths specific thoughts that come to mind as initial reactions.
        
        Reaction to the difference-showing example (simplified by the assumption that a sapient smarter-me is assumed to not exist in any form), in two examples:
        
        Case 1: I hypothetically want enough money to live in luxury (and achieve various other goals) without effort (and hypothetically lack the mental ability to bring this about easily). Extrapolated, a smarter me looking at this real world from the outside would be a separate entity from me, have nothing in particular to gain from making my life easier in such a way, and so not take actions in my interests.
        
        Case 2: A smarter-me watching the world from outside may hold a significantly different aesthetic sense than the normal me in the world, and may act to rearrange the world in such a way as to be most pleasing to that me watching from outside. This being done, in theory resulting in great satisfaction and pleasure of the watcher, the problem remains that the watcher does not in fact exist to appreciate what has been done, and the only sapient entities involved are the humans which have been meddled with for reasons which they presumably do not understand, are not happy about, and plausibly are not benefited by.
        
        I note that a lot in fact hinges on the hypothetical benevolence of the smarter-me, and the assumption/hope/trust that it would after all not act in particularly negative ways toward the existant humans, but given a certain degree of selfishness one can probably assume a range of hopefully-at-worst-neutral significant actions which I personally would probably want to carry out, but which I certainly wouldn’t want to be carried out without anyone pulling the strings in fact benefiting from what was being done.
        
        ...hmm, those can be summed up as ‘The smarter-me wouldn’t aid my selfishness!’ and ‘The smarter-me would act selfishly in ways which don’t benefit anyone since it isn’t sapient!’. There might admittedly be a lot of non-selfishness carried out, but that seems like a quite large variation from the ideal behaviour desired by the client-equivalent. I can understand the throwing-out of the individual selfishness for something based on a group and created for the sake of humanity in general, but the taking of selfish actions for a (possibly congomerate) watcher who does not in fact exist (in terms of what is seen) seems as though it remains to be addressed.
        
        ...I also find myself wondering whether a smarter-me would want to have arrays built to make itself even smarter, and backup computers for redundancy created in various places each able to simulate its full sapience if necessary, resulting in the creation of hardware running a sapient smarter-me even though the decision-making smarter-me who decided to do so wasn’t in fact sapient/{in existance}… though, arguably, that also wouldn’t be too bad in terms of absolute results… hmm.
- Multipartite 22 Oct 2011 20:25 UTC
  0 points
  0
  Parent
  Reading other comments, I note my thoughts on the undesirability of extrapolation have largely been addressed elsewhere already.
  
  Current thoughts on giving higher preference to a subset:
  
  Though one would be happy with a world reworked to fit one’s personal system of values, others likely would not be. Though selected others would be happy with a world reworked to fit their agreed system of values, others likely would not be. Moreover, assuming changes over time, even if such is held to a certain degree at one point in time, changes based on that may turn out to be regrettable.
  
  Given that one’s own position (and those of any other subset) are liable to be riddled with flaws, multiplying may dictate that some alternative to the current situation in the world be provided, but it does not necessarily dictate that one must impose one subset’s values on the rest of the world to the opposition of that rest of the world.
  
  Imposition of peace on those filled with hatred who thickly desire war results in a worsening of those individuals’ situation. Imposition of war on those filled with love who strongly esire peace results in a worsening of those individuals’ situation. Taking it as given that each subset’s ideal outcome differs significantly from that of every other subset in the world, any overall change according to the will of one subset seems liable to yield more opposition and resentment than it does approval and gratitude.
  
  Notably, when thinking up a movement worth supporting, such an action is frightening and unstable—people with differing opinions climbing over each other to be the ones who determine the shape of the future for the rest.
  
  What, then, is an acceptable approach by which the wills coincide of all these people who are opposed to the wills of other groups being imposed on the unwilling?
  
  Perhaps to not remake the world in your own image, or even in the image of people you choose to be fit to remake the world in their own image, or even the image of people someone you know nothing about chose to be fit to remake the world in their own image.
  
  Perhaps a goal worth cooperating towards and joining everyone’s forces together to work towards is that of an alternative, or perhaps many, which people can choose to join and will be imposed on all willing and only those who are willing.
  
  For those who dislike the system others choose, let them stay as they are. For those who like such systems more than their current situation, let them leave and be happier.
  
  Leave the technophiles to their technophilia, the… actually I can’t select other groups, because who would join and who would stay depends on what gets made. Perhaps it might end up with different social groups existing under the separate jurisdictions of different systems, while all those who preferred their current state to any systems as yet created remained on Earth.
  
  A non-interference arrangement with free-to-enter alternatives for all who prefer it to the default situation: while maybe not anyone’s ideal, hopefully something that all can agree is better, and something that to no one is in fact worse.
  
  (Well, maybe to those people who have reasons for not wanting chunks of the population to leave in search of a better life..? Hmm.)
ShardPhoenix 22 Oct 2011 13:02 UTC
3 points
0
I don’t see why the FAI creators would base the CEV on anyone other than themselves except to the extent that they need to do so for political reasons. The result of this would by definition be optimal for the creators.
What links here?
- [deleted]'s comment on In favour of a selective CEV initial dynamic by [deleted] (22 Oct 2011 19:17 UTC; -2 points)
- lessdazed 22 Oct 2011 13:33 UTC
  4 points
  0
  Parent
  Considering your other comments, I’m confident you can answer this for yourself with a few hundred seconds worth of thought.
  - ShardPhoenix 22 Oct 2011 14:26 UTC
    1 point
    0
    Parent
    What is that supposed to mean? (It sounds like an oblique insult but I don’t want to jump to negative conclusions).
    - lessdazed 22 Oct 2011 16:38 UTC
      3 points
      0
      Parent
      It was intended as the opposite.
- [deleted] 22 Oct 2011 16:59 UTC
  3 points
  0
  Parent
  
  I don’t see why the FAI creators would base the CEV on anyone other than themselves except to the extent that they need to do so for political reasons. The result of this would by definition be optimal for the creators.
  
  No, not “by definition”. I advised against that idea specifically in the penultimate paragraph of my essay. One reason for this is that the programmers are not infallible, therefore if they face more challenges in programming a dynamic that can cope with one specific group of people in comparison to another group of people, then there’s no reason why the output of from a CEV dynamic including the second group should not be closer to the output of an ideal CEV including the first group.
  
  Another reason: we generally consider individual humans to have interests—not groups of humans. To take the extreme case as a sufficient disproof, an FAI programmer would prefer the CEV to include 1000 kind people not including himself, rather than himself and 999 psychopaths.
  
  The FAI programming group or any other group should not be reified as having interests of its own (as in “optimal for the creators”).
  
  In any case, “if my Auntie had...”—political reasons do exist!
  - ShardPhoenix 22 Oct 2011 23:17 UTC
    2 points
    0
    Parent
    Your first paragraph seems like a technical issue that may or may not apply in practice, not something that really gets at the heart of CEV. For the second paragraph, I guess I was thinking of the limiting case of there being a single creator of the FAI. With groups, it presumably depends on the extent to which you believe (or can measure) that the average individual from some other group is a better match for your CEV than your FAI co-workers are. But if that’s the case then those co-workers will want to include different groups of their own!
    
    I do agree that political issues may be very important in practice though—for example, “we’ll fund you only if you include such-and-such people’s CEV”.
- Vladimir_Nesov 22 Oct 2011 18:08 UTC
  0 points
  0
  Parent
  
  I don’t see why the FAI creators would base the CEV on anyone other than themselves except to the extent that they need to do so for political reasons.
  
  For moral reasons (although if these are true moral reasons, CEV would do that anyway, this particular error seems easy to correct).
timtyler 22 Oct 2011 12:48 UTC
3 points
0

I argue that the practical difficulty of incorporating all humans into the CEV in the first place is unduly great, and that the programming challenge is also made more difficult by virtue of this choice.

Agreed. IMO, CEV is too silly to be worth much in the way of criticism.
Luke_A_Somers 23 Oct 2011 20:30 UTC
2 points
0
Nobel Laureates are highly abnormal people, and not only in intelligence. I would be rather concerned about what might be expected of us all if our future were based on their CEV.
endoself 21 Oct 2011 19:43 UTC
1 point
0
I don’t see why whether a set of people being a Schelling point is relevant; you don’t seem to be analyzing FAI design as a coordination game. If you were using it metaphorically, please do not use a technical game theory term. Can you clarify this?
- [deleted] 21 Oct 2011 20:24 UTC
  2 points
  0
  Parent
  I was referring to idea that other people would be likely to generate that particular solution for themselves when they first consider the problem of “what is the most suitable selective group of humans for CEV”.
  
  Wikipedia quote: “In game theory, a focal point (also called Schelling point) is a solution that people will tend to use in the absence of communication, because it seems natural, special or relevant to them”.
  
  Although this is not actually a coordination game, I don’t believe that changes that the fact “Nobel laureates” is a solution that people would use if the problem was turned into a coordination game. Maybe I am bending the definition of the term (I don’t really think that I am) but I do so on the basis that I can’t think of a poetic alternative, and because the only time I’ve seen someone use that phrase they used it in the same sense that I have.
  - endoself 21 Oct 2011 22:12 UTC
    3 points
    0
    Parent
    I understood that much. I don’t see why that’s important. Does the fact that it would be a candidate for a Schelling point in a coordination-game-ified version of this problem constitute a reason that choosing it would be desirable (a reason that I don’t see due to inferential distance)? If the problem isn’t a coordination game, why analyze it as one?
    
    In regards to other use of the phrase on LW, I have seen other people misuse it here (though I don’t remember where, so I can’t confirm that their usage was incorrect). Part of the reason I responded to you was because I was worried about the meaning being diluted by LWers who had only seen the phrase here rather than actually studying game theory; we seem to pick up such memes from each other more easily than would be optimal.
    - [deleted] 21 Oct 2011 22:34 UTC
      2 points
      0
      Parent
      
      Does the fact that it would be a candidate for a Schelling point in a coordination-game-ified version of this problem constitute a reason that choosing it would be desirable
      
      Yes, in the sense that if it is a Schelling point then it seems less arbitrary in comparison to a group that hardly anyone would think of suggesting. It may be the case that “group X” is a more ideal group of people to participate in a selective CEV than Nobel laureates—but to the vast majority of people, this will seem like a totally arbitrary choice, therefore proponents are likely to get bogged down justifying it.
      
      If you dislike the idea of using the term “Schelling point” in this way, perhaps you could suggest a concise way of saying “choice that would naturally occur to people” to be used outside of specific game theory problems?
      
      I do recognise your objection and will try to avoid using it in this sense in future.
      - endoself 22 Oct 2011 0:30 UTC
        1 point
        0
        Parent
        Okay, this definitely clears things up.
        
        If you dislike the idea of using the term “Schelling point” in this way, perhaps you could suggest a concise way of saying “choice that would naturally occur to people” to be used outside of specific game theory problems?
        
        ‘Low entropy’ is something I would very naturally use. Of course, this is also a technical term. :) I do have a precise meaning for it in my head—“Learning that the chosen group is the set of non-peace Nobel laureates does not give you that much more information in the sense of conditional entropy given that you already know that the group was to be chosen by humans for the purpose of CEV.”—but now that I think about it, that is quite inferentially far from “Non-peace Nobel laureates would be a low entropy group.”. In the context of LW, perhaps a level of detail between those two could avoid ambiguity.
        
        Whether low entropy would be desirable in this context would depend on what you are trying to achieve. In its favour, it would be easier to justify to others as you mentioned, if that is a concern. Apart from that, I would think that the right solution is likely to be a simple one, but that looking for simple solutions is not the best way to go about finding it. Low entropy provides a bit of evidence for optimality, but you already have criteria that you want to maximize; it is better to analyze these criteria than to use a not-especially-good proxy for them, at least until you’ve hit diminishing returns with the analysis. Also, since you’re human, looking at candidate solutions can make your brain try to argue for or against them rather than getting a deeper understanding; that tends not to end well. Since you seem to be looking at this for the purpose of gathering support by avoiding a feeling of “This is the arbitrary whim of the AI designers.”, this isn’t really relevant to the point you were trying to make, but since I misinterpreted you initially, we get a bit more CEV analysis.
[deleted] 22 Oct 2011 17:41 UTC
0 points
0

On the other hand, given their high level of civilisation and the quality of character necessary for a person to dedicate his life to science, ceteris paribus I’d be more confident of Nobel Prize winners falling into a niceness attractor in comparison to a universal CEV.

The Nobel prize (minus the peace) is roughly an award for western academic achievement, and is mostly awarded to Ashkenazic Jews (27%, which is nine times what you might expect by population). Those three factors do not add up to strong global agreement. Extrapolating from your favorite group is the PD equivalent of defecting. Whether you want to defect depends on things like your expectation of being in a PD tournament (whether other people are trying to create unfriendly AGI or sub-optimal friendly AGI, whether they’re competent enough to finish, the time frames in which you operate) and how similar your thought processes are to theirs (whether their first instinct also is to defect, whether they know that you’re planning to defect, whether they’ll defect if they know you defect). (Incidentally, I wouldn’t choose Nobel prize winners even if I was the only AI programmer in existence, for additional reasons.)

You start out saying that option one is “a solution that people would naturally generate for themselves in the absence of any communication” and then switch to questioning “how much trust are we willing to place in the basic decency of humankind”. So perhaps option one, in your terminology, should not called a Schelling point (not everyone tends to pick option one, you defect), but a symmetric Shelling point?

Finally, please consider using a less enthnocentric word than “civilization”. Willingness to torture is not isolated by culture.
- [deleted] 22 Oct 2011 18:35 UTC
  0 points
  0
  Parent
  
  Extrapolating from your favorite group is the PD equivalent of defecting. Whether you want to defect depends on things like your expectation of being in a PD tournament
  
  That’s a different interpretation of the reference to “al-Qaeda programmers” which I find more compelling. I’ll take it as a point in favour of universal CEV. If it were the case that the existence of other simultaneous AGI projects was considered likely as the FAI project came to fruition, then this consideration would become important.
  
  Finally, please consider using a less enthnocentric word than “civilization”.
  
  “Civilisation” is not intended to have any a priori ethnic connotations. Let us always distinguish between values and facts, and that is all I have to say regarding this line of argument.
  - [deleted] 22 Oct 2011 19:43 UTC
    2 points
    0
    Parent
    
    If it were the case that the existence of other simultaneous AGI projects was considered likely as the FAI project came to fruition, then this consideration would become important.
    
    If? IDSIA, Ben Goertzel’s OpenCog, Jeff Hawkin’s Numenta, Henry Markram’s Blue Brain emulation project, and the SIAI are already working toward AGI and none of them are using your “selective second option”. The 2011 AGI conference reviewed some fifty papers on the topic. Projects already exist. As the field grows and computing becomes cheaper, projects will increase.
    
    You write that CEV “is the best (only?) solution that anyone has provided”, so perhaps this is news. If you read the sequences, you might know that Bill Hibbard advocated using human smiles and reinforcement learning to teach friendliness. Tim Freeman has his own answer. Stuart Armstrong came up with a proposal called “Chaining God”. There are regular threads on Lesswrong debating points of CEV and trying to think of alternative strategies. Lukeprog has written on the state of the field of machine ethics. Ben Goertzel has a series of writings on the subject, Thoughts on AI Morality might be a good place to start.
    
    “Civilisation” is not intended to have any a priori ethnic connotations.
    
    I’m glad to hear you didn’t intend that. I do still believe “civilization” generally has strong cultural connotations (which wikipedia and a few dictionaries corroborate) and offered the suggestion to improve your clarity, not to accuse you of racism.
    - [deleted] 22 Oct 2011 21:54 UTC
      4 points
      0
      Parent
      I have read the sequences. Since Yudkowsky so thoroughly refuted the idea of reinforcement learning I don’t think that that idea deserves to be regarded as a feasible solution to Friendly AI.
      
      On the other hand I wasn’t particularly aware of the wider AGI movement, so thanks for that. Obviously when I say simultaneous AGI projects, I mean projects that are at a similarly advanced stage of development at that point in time—but your point stands.
Kutta 22 Oct 2011 6:10 UTC
0 points
0

Personally I have little regard for veil of ignorance arguments, on the basis that there is no such thing as a veil of ignorance. No, I would not want the al-Qaeda programmers to nominate a group of humans (presumably Islamic fanatics) and extrapolate their volition – I would rather they used all of humanity. But so what?

This veil of ignorance is unlike the Rawlsian one. The FAI programmer really is ignorant about features of his morality.