Limiting an AGI’s Context Temporally
Okay, so I have a proposal for how to advance AI safety efforts significantly.
Humans experience time as exponential decay of utility. One dollar now is worth two dollars some time in the future, which is worth eight dollars even further in the future, and so forth. This is the principle behind compound interest. Most likely, any AI entities we create will have a comparable relationship with time.
So: What if we configured an AI’s half-life of utility to be much shorter than ours?
Imagine, if you will, this principle applied to a paperclip maximizer. “Yeah, if I wanted to, I could make a ten-minute phone call to kick-start my diabolical scheme to take over the world and make octillions of paperclips. But that would take like half a year to come to fruition, and I assign so little weight to what happens in six months that I can’t be bothered to plan that far ahead, even though I could arrange to get octillions of paperclips then if I did. So screw that, I’ll make paperclips the old-fashioned way.”
This approach may prove to be a game-changer in that it allows us to safely make a “prototype” AGI for testing purposes without endangering the entire world. It improves AGI testing in two essential ways:
Decreases the scope of the AI’s actions, so that if disaster happens it might just be confined to the region around the AGI rather than killing the entire world. Makes safety testing much safer on a fundamental level.
Makes the fruits of the AI more obvious more quickly, so that iteration time is shortened drastically. If an AI doesn’t care about any future day, it will take no more than 24 hours to come to a conclusion as to whether it’s dangerous in its current state.
Naturally, finalized AGIs ought to be set so that their half-life of utility resembles ours. But I see no reason why we can’t gradually lengthen it over time as we grow more confident that we’ve taught the AI to not kill us.
(Note: There are 6x10^27 grams of matter on Earth. Throw in a couple orders of magnitude for the benefit of being undisturbed and this suggests that taking over the world represents a utility bonus of roughly 10^30. This is pretty close to 2^100, which suggests that an AGI will not take over the world if its fastest possible takeover scheme would take more than 100 half-lives. Of course, this is just Fermi estimation here, but it still gives me reason to believe that an AGI with a half-life of, say, one second, won’t end human civilization.)