Limiting an AGI’s Context Temporally

Okay, so I have a pro­posal for how to ad­vance AI safety efforts sig­nifi­cantly.

Hu­mans ex­pe­rience time as ex­po­nen­tial de­cay of util­ity. One dol­lar now is worth two dol­lars some time in the fu­ture, which is worth eight dol­lars even fur­ther in the fu­ture, and so forth. This is the prin­ci­ple be­hind com­pound in­ter­est. Most likely, any AI en­tities we cre­ate will have a com­pa­rable re­la­tion­ship with time.
So: What if we con­figured an AI’s half-life of util­ity to be much shorter than ours?


Imag­ine, if you will, this prin­ci­ple ap­plied to a pa­per­clip max­i­mizer. “Yeah, if I wanted to, I could make a ten-minute phone call to kick-start my di­a­bol­i­cal scheme to take over the world and make oc­til­lions of pa­per­clips. But that would take like half a year to come to fruition, and I as­sign so lit­tle weight to what hap­pens in six months that I can’t be both­ered to plan that far ahead, even though I could ar­range to get oc­til­lions of pa­per­clips then if I did. So screw that, I’ll make pa­per­clips the old-fash­ioned way.
This ap­proach may prove to be a game-changer in that it al­lows us to safely make a “pro­to­type” AGI for test­ing pur­poses with­out en­dan­ger­ing the en­tire world. It im­proves AGI test­ing in two es­sen­tial ways:

  1. De­creases the scope of the AI’s ac­tions, so that if dis­aster hap­pens it might just be con­fined to the re­gion around the AGI rather than kil­ling the en­tire world. Makes safety test­ing much safer on a fun­da­men­tal level.

  2. Makes the fruits of the AI more ob­vi­ous more quickly, so that iter­a­tion time is short­ened dras­ti­cally. If an AI doesn’t care about any fu­ture day, it will take no more than 24 hours to come to a con­clu­sion as to whether it’s dan­ger­ous in its cur­rent state.

Nat­u­rally, fi­nal­ized AGIs ought to be set so that their half-life of util­ity re­sem­bles ours. But I see no rea­son why we can’t grad­u­ally lengthen it over time as we grow more con­fi­dent that we’ve taught the AI to not kill us.

(Note: There are 6x10^27 grams of mat­ter on Earth. Throw in a cou­ple or­ders of mag­ni­tude for the benefit of be­ing undis­turbed and this sug­gests that tak­ing over the world rep­re­sents a util­ity bonus of roughly 10^30. This is pretty close to 2^100, which sug­gests that an AGI will not take over the world if its fastest pos­si­ble takeover scheme would take more than 100 half-lives. Of course, this is just Fermi es­ti­ma­tion here, but it still gives me rea­son to be­lieve that an AGI with a half-life of, say, one sec­ond, won’t end hu­man civ­i­liza­tion.)