A plan for world domination seems like something that can’t be concealed from its creators. Lying is no option if your algorithms are open to inspection.
This is just naive. Source code can be available and either the maliciousness not obvious (see the Underhanded C Contest) or not prove what you think it proves (see Reflections on Trusting Trust, just for starters). Assuming you are even inspecting all the existing code rather than a stub left behind to look like an AI.
You are arguing past each-other. XiXiDu is saying that a programmer can create software that can be inspected reliably. We are very close to having provably-correct kernels and compilers, which would make it practical to build reliably sandboxed software, such that we can look inside the sandbox and see that the software data structures are what they ought to be.
It is separately true that not all software can be reliably understood by static inspection, which is all that the underhanded C contest is demonstrating. I would stipulate that the same is true at run-time. But that’s not the case here. Presumably developers of a large complicated AI will design it to be easy to debug—I don’t think they have much chance of a working program otherwise.
No, you are ignoring Xi’s context. The claim is not about what a programmer on the team might do, it is about what the AI might write. Notice that the section starts ‘The goals of an AI will be under scrutiny at any time...’
Yes. I thought Xi’s claim was that if you have an AI and put it to work writing software, the programmers supervising the AI can look at the internal “motivations”, “goals”, and “planning” data structures and see what the AI is really doing. Obfuscation is beside the point.
I agree with you and XiXiDu that such observation should be possible in principle, but I also sort of agree with the detractors. You say,
Presumably developers of a large complicated AI will design it to be easy to debug...
Oh, I’m sure they’d try. But have you ever seen a large software project ? There’s usually mountains and mountains of code that runs in parallel on multiple nodes all over the place. Pieces of it are usually written with good intentions in mind; other pieces are written in a caffeine-fueled fog two days before the deadline, and peppered with years-old comments to the extent of, “TODO: fix this when I have more time”. When the code breaks in some significant way, it’s usually easier to write it from scratch than to debug the fault.
And that’s just enterprise software, which is orders of magnitude less complex than an AGI would be. So yes, it should be possible to write transparent and easily debuggable code in theory, but in practice, I predict that people would write code the usual way, instead.
No, you are ignoring Xi’s context. The claim is not about what a programmer on the team might do, it is about what the AI might write.
You are just lying. Some of what I wrote:
Why wouldn’t the humans who created it not be able to use the same algorithms that the AI uses to predict what it will do?
The goals of an AI will be under scrutiny at any time. It seems very implausible that scientists, a company or the military are going to create an AI and then just let it run without bothering about its plans. An artificial agent is not a black box, like humans are, where one is only able to guess its real intentions.
A plan for world domination seems like something that can’t be concealed from its creators. Lying is no option if your algorithms are open to inspection.
...in this particular instance they were straightforward enough.
I happily admit when I see a straightforward argument. As for example his argument about double-counting probabilities. I have been simply wrong there. But the rest of the comment was not even close to constituting a good argument against anything I wrote in the OP and some of it were just straw men.
You are arguing past each-other. XiXiDu is saying that a programmer can create software that can be inspected reliably. We are very close to having provably-correct kernels and compilers, which would make it practical to build reliably sandboxed software, such that we can look inside the sandbox and see that the software data structures are what they ought to be.
It is separately true that not all software can be reliably understood by static inspection, which is all that the underhanded C contest is demonstrating. I would stipulate that the same is true at run-time. But that’s not the case here. Presumably developers of a large complicated AI will design it to be easy to debug—I don’t think they have much chance of a working program otherwise.
No, you are ignoring Xi’s context. The claim is not about what a programmer on the team might do, it is about what the AI might write. Notice that the section starts ‘The goals of an AI will be under scrutiny at any time...’
Yes. I thought Xi’s claim was that if you have an AI and put it to work writing software, the programmers supervising the AI can look at the internal “motivations”, “goals”, and “planning” data structures and see what the AI is really doing. Obfuscation is beside the point.
I agree with you and XiXiDu that such observation should be possible in principle, but I also sort of agree with the detractors. You say,
Oh, I’m sure they’d try. But have you ever seen a large software project ? There’s usually mountains and mountains of code that runs in parallel on multiple nodes all over the place. Pieces of it are usually written with good intentions in mind; other pieces are written in a caffeine-fueled fog two days before the deadline, and peppered with years-old comments to the extent of, “TODO: fix this when I have more time”. When the code breaks in some significant way, it’s usually easier to write it from scratch than to debug the fault.
And that’s just enterprise software, which is orders of magnitude less complex than an AGI would be. So yes, it should be possible to write transparent and easily debuggable code in theory, but in practice, I predict that people would write code the usual way, instead.
You are just lying. Some of what I wrote:
What asr wrote was just much more clearly.
It is incredible sad that your comment is at 0 and a bunch of fallacious accusations by gwern are at +20.
I hate fallacious arguments by gwern at least as much as the next guy but in this particular instance they were straightforward enough.
I happily admit when I see a straightforward argument. As for example his argument about double-counting probabilities. I have been simply wrong there. But the rest of the comment was not even close to constituting a good argument against anything I wrote in the OP and some of it were just straw men.