How long will it take Betty to FOOM? She may start out dumb enough to unwittingly do Abe’s bidding, but if we’ve reached the stage where Abe and Betty exist, I’d expect that either 1) Betty will rapidly become smart enough to fix her design and avoid furthering Abe’s unfriendly goals or 2) Betty will visibly be much less intelligent than Abe (e.g., she’ll be incapable of creating an AI, as Abe has created her).
Well, I had certainly imagined Betty as less intelligent than Abe when writing this, but you make a good point with (1). If Betty is still programmed to make herself more intelligent, she’d have to start out dumb, probably dumb enough for humans to notice.
To give a concrete example for the chess program: The Turk could change itself to use a linear function of the number of pieces, with one-move lookahead. No matter how much this function is optimized, it’s never going to compare with lookahead.
On the other hand, there’s no particular reason Betty should continue to self-improve that I can see.
On the other hand, there’s no particular reason Betty should continue to self-improve that I can see.
A sub goal that is useful in achieving many primary goals is to improve one’s general goal achieving ability.
An attempt to cripple a FAI by limiting its general intelligence would be noticed, because the humans would expect it to FOOM, and if it actually does FOOM it will be smart enough.
A sneakier unfriendly AI might try to design an FAI with a stupid prior, with blind spots the uFAI can exploit. So you would want your Friendliness test to look at not just the goal system, but every module of the supposed FAI, including epistemology and decision theory.
But, even a thorough test does not make it a good idea to run a supposed FAI designed by an uFAI. This allows the uFAI to optimize for its purpose every bit of uncertainty we have about the supposed FAI.
How long will it take Betty to FOOM? She may start out dumb enough to unwittingly do Abe’s bidding, but if we’ve reached the stage where Abe and Betty exist, I’d expect that either 1) Betty will rapidly become smart enough to fix her design and avoid furthering Abe’s unfriendly goals or 2) Betty will visibly be much less intelligent than Abe (e.g., she’ll be incapable of creating an AI, as Abe has created her).
Well, I had certainly imagined Betty as less intelligent than Abe when writing this, but you make a good point with (1). If Betty is still programmed to make herself more intelligent, she’d have to start out dumb, probably dumb enough for humans to notice.
To give a concrete example for the chess program: The Turk could change itself to use a linear function of the number of pieces, with one-move lookahead. No matter how much this function is optimized, it’s never going to compare with lookahead.
On the other hand, there’s no particular reason Betty should continue to self-improve that I can see.
A sub goal that is useful in achieving many primary goals is to improve one’s general goal achieving ability.
An attempt to cripple a FAI by limiting its general intelligence would be noticed, because the humans would expect it to FOOM, and if it actually does FOOM it will be smart enough.
A sneakier unfriendly AI might try to design an FAI with a stupid prior, with blind spots the uFAI can exploit. So you would want your Friendliness test to look at not just the goal system, but every module of the supposed FAI, including epistemology and decision theory.
But, even a thorough test does not make it a good idea to run a supposed FAI designed by an uFAI. This allows the uFAI to optimize for its purpose every bit of uncertainty we have about the supposed FAI.