But when I started thinking about an AI with this utility function, I became very confused. How exactly do you express this concept of “me” in the code of a utility-maximizing agent? The problem sounds easy enough: it doesn’t refer to any mystical human qualities like “consciousness”, it’s purely a question about programming tricks, but still it looks quite impossible to solve. Any thoughts?
It refers to mystical human qualities like “me” and “think”. Basically I put it in the exact same category as ‘consciousness’.
No it doesn’t. I’m not interested in replicating the inner experience of humans. I’m interested in something that can be easily noticed and tested from the outside: a program that chooses the actions that allow the program to keep running. It just looks like a trickier version of the quine problem, do you think that one’s impossible as well?
If you want this to work in the real world, not a just much simpler computational environment, then for starters: what counts as a “program” “running”? And what distinguishes “the” program from other possible programs? These seem likely to be in the same category as (not to mention subproblems of) consciousness, whatever that category is.
My observation is just that the process you’re going through here in taking the “I think therefore I am” and making it into the descriptive and testable system is similar to the process others may go through to find the simplest way to have a ‘conscious’ system. In fact many people would resolve ‘conscious’ to a very similar kind of system!
I do not think either are impossible to do once you make, shall we say, appropriate executive decisions regarding resolving the ambiguity in “me” or “conscious” into something useful. In fact, I think both are useful problems to look at.
It’s not hard to design a program with a model of the world that includes itself (though actually coding it requires more effort). The first step is to forget about self-modeling, and just ask, how can I model a world with programs? Then later on you put that model in a program, and then you add a few variables or data structures which represent properties of that program itself.
None of this solves problems about consciousness, objective referential meaning of data structures, and so on. But it’s not hard to design a program which will make choices according to a utility function which refers in turn to the program itself.
Well, I don’t want to solve the problem of consciousness right now. You seem to be thinking along correct lines, but I’d appreciate it if you gave a more fleshed out example—not necessarily working code, but an unambiguous spec would be nice.
Getting a program to represent aspects of itself is a well-studied topic. As for representing its relationship to a larger environment, two simple examples:
1) It would be easy to write a program whose “goal” is to always be the biggest memory hog. All it has to do is constantly run a background calculation of adjustable computational intensity, periodically consult its place in the rankings, and if it’s not number one, increase its demand on CPU resources.
2) Any nonplayer character in a game which fights to preserve itself is also engaged in a limited form of self-preservation. And the computational mechanisms for this example should be directly transposable to a physical situation, like robots in a gladiator arena.
All these examples work through indirect self-reference. The program or robot doesn’t know that it is representing itself. This is why I said that self-modeling is not the challenge. If you want your program to engage in sophisticated feats of self-analysis and self-preservation—e.g. figuring out ways to prevent its mainframe from being switched off, asking itself whether a particular port to another platform would still preserve its identity, and so on—the hard part is not the self part. The hard part is to create a program that can reason about such topics at all, whether or not they apply to itself. If you can create an AI which could solve such problems (keeping the power on, protecting core identity) for another AI, you are more than 99% of the way to having an AI that can solve those problems for itself.
It refers to mystical human qualities like “me” and “think”. Basically I put it in the exact same category as ‘consciousness’.
No it doesn’t. I’m not interested in replicating the inner experience of humans. I’m interested in something that can be easily noticed and tested from the outside: a program that chooses the actions that allow the program to keep running. It just looks like a trickier version of the quine problem, do you think that one’s impossible as well?
If you want this to work in the real world, not a just much simpler computational environment, then for starters: what counts as a “program” “running”? And what distinguishes “the” program from other possible programs? These seem likely to be in the same category as (not to mention subproblems of) consciousness, whatever that category is.
Right now I’d be content with an answer in some simple computational environment. Let’s solve the easy problem before attempting the hard one.
My observation is just that the process you’re going through here in taking the “I think therefore I am” and making it into the descriptive and testable system is similar to the process others may go through to find the simplest way to have a ‘conscious’ system. In fact many people would resolve ‘conscious’ to a very similar kind of system!
I do not think either are impossible to do once you make, shall we say, appropriate executive decisions regarding resolving the ambiguity in “me” or “conscious” into something useful. In fact, I think both are useful problems to look at.
It’s not hard to design a program with a model of the world that includes itself (though actually coding it requires more effort). The first step is to forget about self-modeling, and just ask, how can I model a world with programs? Then later on you put that model in a program, and then you add a few variables or data structures which represent properties of that program itself.
None of this solves problems about consciousness, objective referential meaning of data structures, and so on. But it’s not hard to design a program which will make choices according to a utility function which refers in turn to the program itself.
Well, I don’t want to solve the problem of consciousness right now. You seem to be thinking along correct lines, but I’d appreciate it if you gave a more fleshed out example—not necessarily working code, but an unambiguous spec would be nice.
Getting a program to represent aspects of itself is a well-studied topic. As for representing its relationship to a larger environment, two simple examples:
1) It would be easy to write a program whose “goal” is to always be the biggest memory hog. All it has to do is constantly run a background calculation of adjustable computational intensity, periodically consult its place in the rankings, and if it’s not number one, increase its demand on CPU resources.
2) Any nonplayer character in a game which fights to preserve itself is also engaged in a limited form of self-preservation. And the computational mechanisms for this example should be directly transposable to a physical situation, like robots in a gladiator arena.
All these examples work through indirect self-reference. The program or robot doesn’t know that it is representing itself. This is why I said that self-modeling is not the challenge. If you want your program to engage in sophisticated feats of self-analysis and self-preservation—e.g. figuring out ways to prevent its mainframe from being switched off, asking itself whether a particular port to another platform would still preserve its identity, and so on—the hard part is not the self part. The hard part is to create a program that can reason about such topics at all, whether or not they apply to itself. If you can create an AI which could solve such problems (keeping the power on, protecting core identity) for another AI, you are more than 99% of the way to having an AI that can solve those problems for itself.
This concept is extremely complex (for example, which “outside” are you talking about?).
You seem to be reading more than I intended into my original question. If the program is running in a simulated world, we’re on the outside.
Yes, using a formal world simplifies this a lot.