If we observed the AI drawing a bunch of correct conclusions, we can pretty quickly build up more evidence that the AI is not insane, and that whatever it is actually thinking is the truth. Most of the bugs that I ever discover in the software I write (I know there is a selection bias here, but I know from experience that my testing is thorough enough to almost catch all bugs that have non-exceedingly-rare consequences for normal use) are the kind that are ovbious right away. It takes a very specific kind of bug to have the possibility of changing the output in a consequential way, but not to be wrong on the first few tests.
But lots of UFAIs would pretend to be FAIs, so it would be much harder to get evidence that it was friendly.
Could I add a brief thought? Even if you could program an AI with no bugs, that AI could make “mistakes”. What we consider a mistake depends on the practical purpose we have in mind for the AI. What we consider a bug depends on the operational purpose of the segment of code the bug appears in. But the operational purpose may not match up with our practical purpose. So code which achieves its operational purpose might not achieve its practical purpose.
Let me take this a bit deeper. The programmer may have an overarching intended purpose for the entire project of building the AI. They may also have an intended purpose for the entirety of the code written. They may also have an intended purpose for the programs A, B, C… in the code. They may… and so forth. At every stage of increasing generality, there is a potential for the more general purpose to not be fulfilled. Your programs might do what you want them to, but they might fail in combination to detect an absolute denial macro. The entirety of your code might resemble an AI, but it would take a computer far more powerful than any in existence to run on. You might get the whole project up and working, but only for the AI to decide to commit suicide. Or, more relevantly, you might get the whole project working, but the AI turns out to be dumb, because the way you thought an AI ought to think in order to be clever didn’t work out. Your intended purpose to create an AI which thought in the way you intended was successful, but one of the more general purposes, to create an AI that was clever, failed.
So it would seem that bug checking is more prone to human error than you implied, especially as intended purposes are themselves often vague. I don’t claim, however, that these challenges are insurmountable. Also, if anyone is uncomfortable with the phrase “intended purpose” I used, feel free to replace with “what the programmer had in mind”, as that is all I meant by it.
Hi. Checking back on this account on a whim after a long time of not using it. You’re right. 2012!Mestroyer was a noob and I am still cleaning up his bad software.
If we observed the AI drawing a bunch of correct conclusions, we can pretty quickly build up more evidence that the AI is not insane, and that whatever it is actually thinking is the truth. Most of the bugs that I ever discover in the software I write (I know there is a selection bias here, but I know from experience that my testing is thorough enough to almost catch all bugs that have non-exceedingly-rare consequences for normal use) are the kind that are ovbious right away. It takes a very specific kind of bug to have the possibility of changing the output in a consequential way, but not to be wrong on the first few tests.
But lots of UFAIs would pretend to be FAIs, so it would be much harder to get evidence that it was friendly.
Could I add a brief thought? Even if you could program an AI with no bugs, that AI could make “mistakes”. What we consider a mistake depends on the practical purpose we have in mind for the AI. What we consider a bug depends on the operational purpose of the segment of code the bug appears in. But the operational purpose may not match up with our practical purpose. So code which achieves its operational purpose might not achieve its practical purpose.
Let me take this a bit deeper. The programmer may have an overarching intended purpose for the entire project of building the AI. They may also have an intended purpose for the entirety of the code written. They may also have an intended purpose for the programs A, B, C… in the code. They may… and so forth. At every stage of increasing generality, there is a potential for the more general purpose to not be fulfilled. Your programs might do what you want them to, but they might fail in combination to detect an absolute denial macro. The entirety of your code might resemble an AI, but it would take a computer far more powerful than any in existence to run on. You might get the whole project up and working, but only for the AI to decide to commit suicide. Or, more relevantly, you might get the whole project working, but the AI turns out to be dumb, because the way you thought an AI ought to think in order to be clever didn’t work out. Your intended purpose to create an AI which thought in the way you intended was successful, but one of the more general purposes, to create an AI that was clever, failed.
So it would seem that bug checking is more prone to human error than you implied, especially as intended purposes are themselves often vague. I don’t claim, however, that these challenges are insurmountable. Also, if anyone is uncomfortable with the phrase “intended purpose” I used, feel free to replace with “what the programmer had in mind”, as that is all I meant by it.
Hi. Checking back on this account on a whim after a long time of not using it. You’re right. 2012!Mestroyer was a noob and I am still cleaning up his bad software.