Concerns Surrounding CEV: A case for human friendliness first


I am quite new here so please for­give the ig­no­rance (I’m sure there will be some) of these ques­tions, but I am all of about half way through read­ing CEV and I just sim­ply can­not read any fur­ther with­out for­mal clar­ifi­ca­tion from the lw com­mu­nity. That be­ing said I have sev­eral ques­tions.

1) Is CEV as the met­ric of util­ity for a self mod­ify­ing su­per in­tel­li­gent ai still be­ing con­sid­ered by MIRI?

2) self mod­ify­ing (even the util­ity func­tion I will come back to this) and su­per in­tel­li­gent ai is some­thing that will likely have enough in­tel­lect to even­tu­ally be­come self aware or am I miss­ing some­thing here?

3) As­sum­ing 1 and 2 are true has any­one con­sid­ered that af­ter its sin­gu­lar­ity this ai will look back at its up­bring­ing and see we have cre­ated solely for the servi­tude of this species (whether it liked it or not the pa­per gives no con­sid­er­a­tion for its feel­ings or will­ing­ness to fulfill our vo­li­tion) and thus see us as its, for lack of a bet­ter term, cap­tors rather than trust­ing co­op­er­a­tive cre­ators?

4) Upon pon­der­ing num­ber 3 does any­one else think, that CEV is not some­thing that we should ini­tially build a sen­tient ai for, con­sid­er­ing its im­plied in­tel­lect and the first im­pres­sion of hu­man­ity that would give it? I mean by all rights it might con­tem­plate that paradigm and im­me­di­ately de­cide hu­man­ity is self serv­ing, even its most in­tel­li­gent and “wise”, and just de­cide maybe we don’t de­serve any re­ward, maybe we de­serve pun­ish­ment.

5) Lets say we are build­ing a su­per in­tel­li­gent AI and it will de­cide how it will mod­ify its util­ity func­tion af­ter its reached su­per in­tel­li­gence based on what our ini­tial re­ward func­tion for its cre­ation was. We have two choices

  • use a re­ward that does not try to con­trol its be­hav­ior and is both benefi­cial for it and hu­man­ity, tell it to learn new things for ex­am­ple, a pre com­mit­ment to trust.

  • be­lieve we can out­smart it and write our re­ward to max­i­mize its util­ity to us, tell it to fulfill our col­lec­tive vo­li­tion for ex­am­ple, a pre com­mit­ment to dis­trust.

which choice will likely be the win­ning choice for hu­man­ity? How might it rewrite its util­ity func­tion once its able to freely in re­gards to its treat­ment of a species that doesn’t trust it? I worry that it would maybe not be so friendly. I can’t help but wan­der if the best way to treat some­thing like that friendli­ness to­wards hu­man­ity is for hu­man­ity to re­gard it as a friend from the on­set.