And S is supposed to know its own source code via a quining-type trick.
This phrase got me thinking in another completely irrelevant direction. If you know your own source code by quining, how do you know that it’s really your source code? How does one verify such things?
Here’s a possibly more relevant variant of the question: we human beings don’t have access to our own source code via quining, so how are we supposed to make decisions?
My thoughts on this so far are that we need to develop a method of mapping an external description of an mathematical object to what it feels like from the inside. Then we can say that the consequences of “me choosing option A” is the logical consequences of all objects with the same subjective experiences/memories as me choosing option A.
I think the quining trick may just be a stopgap solution, and the full solution even for AIs will need to involve something like the above. That’s one possibility that I’m thinking about.
I guess the ability to find oneself in the environment depends on no strange things happening in the environments you care about (which are “probable”), so that you can simply pattern-match the things that qualify as you, starting with an extremely simple idea of what “you” is, in some sense inbuilt is our minds by evolution. But in general, if there are tons of almost-you running around, you need the exact specification to figure out which of them you actually control.
This is basically the same idea I want to use to automatically extract human preference, even though strictly speaking one already needs full preference to recognize its instance.
Then we can say that the consequences of “me choosing option A” is the logical consequences of all objects with the same subjective experiences/memories as me choosing option A.
Not exactly… Me and Bob may have identical experiences/memories, but I have a hidden rootkit installed that makes me defect, and Bob doesn’t.
Maybe it would make more sense to inspect the line of reasoning that leads to my defection, and ask a) would this line of reasoning likely occur to Bob? b) would he find it overwhelmingly convincing? This is kinda like quining, because the line of reasoning must refer to a copy of itself in Bob’s mind.
This just considers the “line of reasoning” as a whole agent (part of the original agent, but not controlled by the original agent), and again assumes perfect self-knowledge by that “line of reasoning” sub-agent (supplied by the bigger agent, perhaps, but taken on faith by the sub-agent).
What is the “you” that is supposed to verify that? It’s certainly possible if “you” already have your source code via the quine trick, so that you just compare it with the one given to you. On the other hand, if “you” are a trivial program that is not able to do that and answers “yes, it’s my source code all right” unconditionally, there is nothing to be done about that. You have to assume something about the agent.
“What is the you” is part of the question. Consider it in terms of counterfactuals. Agent A is told via quining that it has source code S. We’re interested in how to implement A so that it outputs “yes” if S is really its source code, but would output “no” if S were changed to S’ while leaving the rest of the agent unchanged.
In this formulation the problem seems to be impossible to solve, unless the agent has access to an external “reader oracle” that can just read its source code back to it. Guess that answers my original question, then.
This phrase got me thinking in another completely irrelevant direction. If you know your own source code by quining, how do you know that it’s really your source code? How does one verify such things?
Here’s a possibly more relevant variant of the question: we human beings don’t have access to our own source code via quining, so how are we supposed to make decisions?
My thoughts on this so far are that we need to develop a method of mapping an external description of an mathematical object to what it feels like from the inside. Then we can say that the consequences of “me choosing option A” is the logical consequences of all objects with the same subjective experiences/memories as me choosing option A.
I think the quining trick may just be a stopgap solution, and the full solution even for AIs will need to involve something like the above. That’s one possibility that I’m thinking about.
I guess the ability to find oneself in the environment depends on no strange things happening in the environments you care about (which are “probable”), so that you can simply pattern-match the things that qualify as you, starting with an extremely simple idea of what “you” is, in some sense inbuilt is our minds by evolution. But in general, if there are tons of almost-you running around, you need the exact specification to figure out which of them you actually control.
This is basically the same idea I want to use to automatically extract human preference, even though strictly speaking one already needs full preference to recognize its instance.
Not exactly… Me and Bob may have identical experiences/memories, but I have a hidden rootkit installed that makes me defect, and Bob doesn’t.
Maybe it would make more sense to inspect the line of reasoning that leads to my defection, and ask a) would this line of reasoning likely occur to Bob? b) would he find it overwhelmingly convincing? This is kinda like quining, because the line of reasoning must refer to a copy of itself in Bob’s mind.
This just considers the “line of reasoning” as a whole agent (part of the original agent, but not controlled by the original agent), and again assumes perfect self-knowledge by that “line of reasoning” sub-agent (supplied by the bigger agent, perhaps, but taken on faith by the sub-agent).
What is the “you” that is supposed to verify that? It’s certainly possible if “you” already have your source code via the quine trick, so that you just compare it with the one given to you. On the other hand, if “you” are a trivial program that is not able to do that and answers “yes, it’s my source code all right” unconditionally, there is nothing to be done about that. You have to assume something about the agent.
“What is the you” is part of the question. Consider it in terms of counterfactuals. Agent A is told via quining that it has source code S. We’re interested in how to implement A so that it outputs “yes” if S is really its source code, but would output “no” if S were changed to S’ while leaving the rest of the agent unchanged.
In this formulation the problem seems to be impossible to solve, unless the agent has access to an external “reader oracle” that can just read its source code back to it. Guess that answers my original question, then.