A significant amount of discussion on Less Wrong appears to be of the following form:
1: How do we make a superintelligent AI perform more as we want it to, without reducing it to a paperweight?
Note: reducing it to a paperweight is the periodically referenced “Put the superintelligence in a box and then delete it if it sends any output outside the box.” school of AI Safety.
Something really obvious occurred to me, and it seems so basic that there has to be an answer somewhere, but I don’t know what to look under. What if we try flipping the question and asking this?
2: How do we make an AI that obediently performs as we want it to, but does so smarter, while maintaining it’s obedience?
I’m assuming that’s known and discussed. Is there a name for it? Maybe a flaw that I’m not seeing?
It does seem like an interesting question. But the most obvious flaw is that we still don’t have the starting point—software does what we tell it to do, not what we want, which is usually different—and I don’t immediately see any way to get there without super-intelligence.
Holden Karnofsky proposed starting with an Oracle AI that tells us what it would do if we gave it different goal systems. But if we avoided giving it any utility function of its own, the programmers would need to not only think of every question (regarding every aspect of “what it would do”), but also create an interface for each sufficiently new answer. I’ll go out on a limb and say this will never happen (much less happen correctly) if someone in the world can just create an ‘Agent AI’.
How do we make an AI that obediently performs as we want it to, but does so smarter, while maintaining it’s obedience?
Depends on what you mean by “smarter”? It is merely good at finding more efficient ways to fulfill your wish… or is it also able to realize that some literal intepretations of your wish are not what you actually want to happen (but perhaps you aren’t smart enough to realize it)? In the latter case, will it efficiently follow the literal intepretation?
A significant amount of discussion on Less Wrong appears to be of the following form:
1: How do we make a superintelligent AI perform more as we want it to, without reducing it to a paperweight?
Note: reducing it to a paperweight is the periodically referenced “Put the superintelligence in a box and then delete it if it sends any output outside the box.” school of AI Safety.
Something really obvious occurred to me, and it seems so basic that there has to be an answer somewhere, but I don’t know what to look under. What if we try flipping the question and asking this?
2: How do we make an AI that obediently performs as we want it to, but does so smarter, while maintaining it’s obedience?
I’m assuming that’s known and discussed. Is there a name for it? Maybe a flaw that I’m not seeing?
It does seem like an interesting question. But the most obvious flaw is that we still don’t have the starting point—software does what we tell it to do, not what we want, which is usually different—and I don’t immediately see any way to get there without super-intelligence.
Holden Karnofsky proposed starting with an Oracle AI that tells us what it would do if we gave it different goal systems. But if we avoided giving it any utility function of its own, the programmers would need to not only think of every question (regarding every aspect of “what it would do”), but also create an interface for each sufficiently new answer. I’ll go out on a limb and say this will never happen (much less happen correctly) if someone in the world can just create an ‘Agent AI’.
Depends on what you mean by “smarter”? It is merely good at finding more efficient ways to fulfill your wish… or is it also able to realize that some literal intepretations of your wish are not what you actually want to happen (but perhaps you aren’t smart enough to realize it)? In the latter case, will it efficiently follow the literal intepretation?