What do superintelligences really want? [Link]

In Con­clu­sion:

In the case of hu­mans, ev­ery­thing that we do that seems in­tel­li­gent is part of a large, com­plex mechanism in which we are en­gaged to en­sure our sur­vival. This is so hard­wired into us that we do not see it eas­ily, and we cer­tainly can­not change it very much. How­ever, su­per­in­tel­li­gent com­puter pro­grams are not limited in this way. They un­der­stand the way that they work, can change their own code, and are not limited by any par­tic­u­lar re­ward mechanism. I ar­gue that be­cause of this fact, such en­tities are not self-con­sis­tent. In fact, if our su­per­in­tel­li­gent pro­gram has no hard-coded sur­vival mechanism, it is more likely to switch it­self off than to de­stroy the hu­man race willfully.

Link: physic­sand­cake.word­press.com/​2011/​01/​22/​pavlovs-ai-what-did-it-mean/​

Suzanne Gildert ba­si­cally ar­gues that any AGI that can con­sid­er­ably self-im­prove would sim­ply al­ter its re­ward func­tion di­rectly. I’m not sure how she ar­rives at the con­clu­sion that such an AGI would likely switch it­self off. Even if an ab­stract gen­eral in­tel­li­gence would tend to al­ter its re­ward func­tion, wouldn’t it do so in­definitely rather than switch­ing it­self off?

So imag­ine a sim­ple ex­am­ple – our case from ear­lier – where a com­puter gets an ad­di­tional ’1′ added to a nu­mer­i­cal value for each good thing it does, and it tries to max­i­mize the to­tal by do­ing more good things. But if the com­puter pro­gram is clever enough, why can’t it just rewrite it’s own code and re­place that piece of code that says ‘add 1′ with an ’add 2′? Now the pro­gram gets twice the re­ward for ev­ery good thing that it does! And why stop at 2? Why not 3, or 4? Soon, the pro­gram will spend so much time think­ing about ad­just­ing its re­ward num­ber that it will ig­nore the good task it was do­ing in the first place!
It seems that be­ing in­tel­li­gent enough to start mod­ify­ing your own re­ward mechanisms is not nec­es­sar­ily a good thing!

If it wants to max­i­mize its re­ward by in­creas­ing a nu­mer­i­cal value, why wouldn’t it con­sume the uni­verse do­ing so? Maybe she had some­thing in mind along the lines of an ar­gu­ment by Katja Grace:

In try­ing to get to most goals, peo­ple don’t in­vest and in­vest un­til they ex­plode with in­vest­ment. Why is this? Be­cause it quickly be­comes cheaper to ac­tu­ally fulfil a goal at than it is to in­vest more and then fulfil it. [...] A crea­ture should only in­vest in many lev­els of in­tel­li­gence im­prove­ment when it is pur­su­ing goals sig­nifi­cantly more re­source in­ten­sive than cre­at­ing many lev­els of in­tel­li­gence im­prove­ment.

Link: me­te­uphoric.word­press.com/​2010/​02/​06/​cheap-goals-not-ex­plo­sive/​

I am not sure if that ar­gu­ment would ap­ply here. I sup­pose the AI might hit diminish­ing re­turns but could again al­ter its re­ward func­tion to pre­vent that, though what would be the in­cen­tive for do­ing so?


I left a com­ment over there:

Be­cause it would con­sume the whole uni­verse in an effort to en­code an even larger re­ward num­ber? In the case that an AI de­cides to al­ter its re­ward func­tion di­rectly, max­i­miz­ing its re­ward by means of im­prov­ing its re­ward func­tion be­comes its new goal. Why wouldn’t it do ev­ery­thing to max­i­mize its pay­off, af­ter all it has no in­cen­tive to switch it­self off? And why would it ac­count for hu­mans in do­ing so?

ETA #2:

What else I wrote:

There is ab­solutely no rea­son (in­cen­tive) for it to do any­thing ex­cept in­creas­ing its re­ward num­ber. This in­cludes the mod­ifi­ca­tion of its re­ward func­tion in any way that would not in­crease the nu­mer­i­cal value that is the re­ward num­ber.

We are talk­ing about a gen­eral in­tel­li­gence with the abil­ity to self-im­prove to­wards su­per­hu­man in­tel­li­gence. Of course it would do a long-term risks-benefits anal­y­sis and calcu­late its pay­off and do ev­ery­thing to in­crease its re­ward num­ber max­i­mally. Hu­man val­ues are com­plex but su­per­hu­man in­tel­li­gence does not im­ply com­plex val­ues. It has no in­cen­tive to al­ter its goal.