[deleted] comments on Trapping AIs via utility indifference

[deleted] 1 Mar 2012 20:14 UTC
0 points
I though of something that might come up before we even get to where we could use this trap.

I’m assuming we have to be reviewing the logs and code while we’re coding it, just so that we can avoid things like “ERROR 22-322: Syntax error, expecting one of the following: ;, (, COMPRESS, DATA, FC, FORMCHAR, NLEVELS, NOPRINT, ORDER, PAGE.” from basic typos.

And the first time we’re running it that it doesn’t just crash with an error to the log, we still have to look at the code, log, and basic output for problems that don’t crash the program, like the classic failure “The utility function appears to have been multiplied by negative one.”

There are generally options to supress external logs, and a self modifying program might add one of them to the code in an attempt to see if it made it run faster or by accident. But I would normally consider that in itself a bug and would kill the program and look at the code and the log before it went quiet.

The reason why is the log going quiet actually happens somewhat frequently in my programming language, and in many cases, the cause is a malformed infinite loop which does not itself generate output.

Something like;

x=0;

do until x>10;

*Increment X;

x=x-1;

end;

This is just a basic error. There’s a -sign instead of a + sign in the increment loop. But x isn’t becoming greater than 10.

So if my program stopped logging or outputting, I wouldn’t even be ending it out of “Oh no, it’s gone rampant!” I would just assume it was caught in an infinite loop, and would reset it and try to fix it.

It’s as if God turns his back to see if Eve does anything interesting while he’s not looking. God realizes that he hasn’t heard anything from Eve in 30 minutes, and turns back around… and Eve has gotten herself trapped in a multimile deep pit, and is still digging. Because according to Eve’s current brain, you go up by digging, and she needs to go up to take a closer look at the apple, so that she knows exactly what it is God told her not to eat.

I guess another way of expressing what I was considering is that turning your back on the AI might allow you to know what it does when it isn’t being looked at, but it also allows it to get caught in useless churning.

So, if we were considering writing a self modifying AI, is there any way to guard it against this, so that we no longer have to look at the logs just for basic checks against churn, and we can get to the point that it is stable enough to consider variants like Stuart_Armstrong’s idea to see if it reacts differently when logged compared to when unlogged? I understand how to do that for fixed code, but I’m not sure how you would do that for self modifying code.