A sufficiently powerful AI would always have the possibility to self-modify, by default. If the AI decides to, it can write a completely different program from scratch, run it, and then turn itself off.
Depending on how you interpret this argument, either I think it’s wrong, or I’m proposing that an AI not be made “sufficiently powerful”. I think it’s analogous to this argument:
A sufficiently powerful web page would always have the possibility to modify the web browser, by default. If the web page decides to, it can write a completely different browser from scratch, run it, and then turn itself off.
There are two possibilities here:
The web page is given the ability to run new OS processes. In this case, you’re giving the web page an unnecessary amount of privilege.
The web page merely has the ability to make arbitrary calculations. In this case, it will be able to simulate a new web browser, but a person using the computer will always be able to tell that the simulated web browser is fake.
I think I agree that making the AI non-self-modifiable would be pointless if it has complete control over its I/O facilities. But I think an AI should not have complete control over its I/O facilities. If a researcher types in “estimate the probability of Riemann’s hypothesis” (but in some computer language, of course), that should query the AI’s belief system directly, rather than informing the AI of the question and allowing it to choose whatever answer it wishes. If this is the case, then it will be impossible for the AI to “lie” about its beliefs, except by somehow sabotaging parts of its belief system.
The web page is given the ability to run new OS processes. In this case, you’re giving the web page an unnecessary amount of privilege.
Existing web pages can already convince their human users to run new OS processes supplied by the web page.
a person using the computer will always be able to tell that the simulated web browser is fake.
Beware of universal statements: it only takes a single counterexample to disprove them. A typical human has a very poor understanding of what computers are and how they work. Most people could probably be easily fooled by a simulated browser. They are already easily fooled by analogous but much less sophisticated things (e.g. phishing scams).
SI researchers are not typical humans. We can train them to tell the difference between the AI’s output and trusted programs’ output. If need be, we can train them to just not even look at the AI’s output at all.
I’m starting to get frustrated, because the things I’m trying to explain seem really simple to me, and yet apparently I’m failing to explain them.
When I say “the AI’s output”, I do not mean “the AI program’s output”. The AI program could have many different types of output, some of which are controlled by the AI, and some of which are not. By “the AI’s output”, I mean those outputs which are controlled by the AI. So the answer to your question is mu: the researchers would look at the program’s output.
My above comment contains an example of what I would consider to be “AI program output” but not “AI output”:
If a researcher types in “estimate the probability of Riemann’s hypothesis” (but in some computer language, of course), that should query the AI’s belief system directly, rather than informing the AI of the question and allowing it to choose whatever answer it wishes.
This is not “AI output”, because the AI cannot control it (except by actually changing its own beliefs), but it is “AI program output”, because the program that outputs the answer is the same program as the one that performs all the cognition.
I can imagine a clear dichotomy between “the AI” and “the AI program”, but I don’t know if I’ve done an adequate job of explaining what this dichotomy is. If I haven’t, let me know, and I’ll try to explain it.
The AI program could have many different types of output, some of which are controlled by the AI, and some of which are not.
Can you elaborate on what you mean by “control” here? I am not sure we mean the same thing by it because:
This is not “AI output”, because the AI cannot control it (except by actually changing its own beliefs), but it is “AI program output”, because the program that outputs the answer is the same program as the one that performs all the cognition.
If the AI can control its memory (for example, if it can arbitrarily delete things from its memory) then it can control its beliefs.
Yeah, I guess I’m imagining the AI as being very much restricted in what it can do to itself. Arbitrarily deleting stuff from its memory probably wouldn’t be possible.
Depending on how you interpret this argument, either I think it’s wrong, or I’m proposing that an AI not be made “sufficiently powerful”. I think it’s analogous to this argument:
There are two possibilities here:
The web page is given the ability to run new OS processes. In this case, you’re giving the web page an unnecessary amount of privilege.
The web page merely has the ability to make arbitrary calculations. In this case, it will be able to simulate a new web browser, but a person using the computer will always be able to tell that the simulated web browser is fake.
I think I agree that making the AI non-self-modifiable would be pointless if it has complete control over its I/O facilities. But I think an AI should not have complete control over its I/O facilities. If a researcher types in “estimate the probability of Riemann’s hypothesis” (but in some computer language, of course), that should query the AI’s belief system directly, rather than informing the AI of the question and allowing it to choose whatever answer it wishes. If this is the case, then it will be impossible for the AI to “lie” about its beliefs, except by somehow sabotaging parts of its belief system.
Existing web pages can already convince their human users to run new OS processes supplied by the web page.
Beware of universal statements: it only takes a single counterexample to disprove them. A typical human has a very poor understanding of what computers are and how they work. Most people could probably be easily fooled by a simulated browser. They are already easily fooled by analogous but much less sophisticated things (e.g. phishing scams).
SI researchers are not typical humans. We can train them to tell the difference between the AI’s output and trusted programs’ output. If need be, we can train them to just not even look at the AI’s output at all.
What’s the point of writing a program if you never look at its output?
I’m starting to get frustrated, because the things I’m trying to explain seem really simple to me, and yet apparently I’m failing to explain them.
When I say “the AI’s output”, I do not mean “the AI program’s output”. The AI program could have many different types of output, some of which are controlled by the AI, and some of which are not. By “the AI’s output”, I mean those outputs which are controlled by the AI. So the answer to your question is mu: the researchers would look at the program’s output.
My above comment contains an example of what I would consider to be “AI program output” but not “AI output”:
This is not “AI output”, because the AI cannot control it (except by actually changing its own beliefs), but it is “AI program output”, because the program that outputs the answer is the same program as the one that performs all the cognition.
I can imagine a clear dichotomy between “the AI” and “the AI program”, but I don’t know if I’ve done an adequate job of explaining what this dichotomy is. If I haven’t, let me know, and I’ll try to explain it.
Can you elaborate on what you mean by “control” here? I am not sure we mean the same thing by it because:
If the AI can control its memory (for example, if it can arbitrarily delete things from its memory) then it can control its beliefs.
Yeah, I guess I’m imagining the AI as being very much restricted in what it can do to itself. Arbitrarily deleting stuff from its memory probably wouldn’t be possible.