A recent comment about Descartes inspired this thought: the simplest possible utility function for an agent is one that only values survival of mind, as in “I think therefore I am”. This function also seems to be immune to the wireheading problem because it’s optimizing something directly perceivable by the agent, rather than some proxy indicator.
But when I started thinking about an AI with this utility function, I became very confused. How exactly do you express this concept of “me” in the code of a utility-maximizing agent? The problem sounds easy enough: it doesn’t refer to any mystical human qualities like “consciousness”, it’s purely a question about programming tricks, but still it looks quite impossible to solve. Any thoughts?
You want the program to keep running in the context of the world. To specify what that means, you need to build on top of an ontology that refers to the world. But figuring out such ontology is a very difficult problem and you can’t even in principle refer to the whole world as it really is: you’ll always have uncertainty left, even in a general ontological model.
The program will have to know what tradeoffs to make, for example whether it’s important to survive in most possible worlds with fair probability, or in at least one possible world with high probability. These would lead to very different behavior, and the possibility of such tradeoffs exemplifies how much data such preference would require. If additionally you want to keep most of the world as it would be if the AI was never created, that’s another complex counterfactual for you to bake in into its preference.
It’s a very difficult problem, probably more difficult that FAI, since for FAI we at least have some hope of cheating and copying formal preference from an existing blueprint, and here you have to build that from scratch, translating your requirements from human-speak to formal specification.
An agent’s “me” is its model of itself. This is already a fairly complicated thing for an agent to have, and it need not have one.
Why do you say that an agent can “directly perceive” its own mind? Or anything else? A perception is just a signal somewhere inside the agent: a voltage, a train of neural firings, or whatever. It can never be identical to the thing that caused it, the thing that it is a perception of. People can very easily have mistaken ideas of who they are.
Program must have something to preserve. My first thought is preservation of declarative memory: ensure that future contain chain of systems, implementing same goal, with overlapping declarative memory.
But when I started thinking about an AI with this utility function, I became very confused. How exactly do you express this concept of “me” in the code of a utility-maximizing agent? The problem sounds easy enough: it doesn’t refer to any mystical human qualities like “consciousness”, it’s purely a question about programming tricks, but still it looks quite impossible to solve. Any thoughts?
It refers to mystical human qualities like “me” and “think”. Basically I put it in the exact same category as ‘consciousness’.
No it doesn’t. I’m not interested in replicating the inner experience of humans. I’m interested in something that can be easily noticed and tested from the outside: a program that chooses the actions that allow the program to keep running. It just looks like a trickier version of the quine problem, do you think that one’s impossible as well?
If you want this to work in the real world, not a just much simpler computational environment, then for starters: what counts as a “program” “running”? And what distinguishes “the” program from other possible programs? These seem likely to be in the same category as (not to mention subproblems of) consciousness, whatever that category is.
My observation is just that the process you’re going through here in taking the “I think therefore I am” and making it into the descriptive and testable system is similar to the process others may go through to find the simplest way to have a ‘conscious’ system. In fact many people would resolve ‘conscious’ to a very similar kind of system!
I do not think either are impossible to do once you make, shall we say, appropriate executive decisions regarding resolving the ambiguity in “me” or “conscious” into something useful. In fact, I think both are useful problems to look at.
It’s not hard to design a program with a model of the world that includes itself (though actually coding it requires more effort). The first step is to forget about self-modeling, and just ask, how can I model a world with programs? Then later on you put that model in a program, and then you add a few variables or data structures which represent properties of that program itself.
None of this solves problems about consciousness, objective referential meaning of data structures, and so on. But it’s not hard to design a program which will make choices according to a utility function which refers in turn to the program itself.
Well, I don’t want to solve the problem of consciousness right now. You seem to be thinking along correct lines, but I’d appreciate it if you gave a more fleshed out example—not necessarily working code, but an unambiguous spec would be nice.
Getting a program to represent aspects of itself is a well-studied topic. As for representing its relationship to a larger environment, two simple examples:
1) It would be easy to write a program whose “goal” is to always be the biggest memory hog. All it has to do is constantly run a background calculation of adjustable computational intensity, periodically consult its place in the rankings, and if it’s not number one, increase its demand on CPU resources.
2) Any nonplayer character in a game which fights to preserve itself is also engaged in a limited form of self-preservation. And the computational mechanisms for this example should be directly transposable to a physical situation, like robots in a gladiator arena.
All these examples work through indirect self-reference. The program or robot doesn’t know that it is representing itself. This is why I said that self-modeling is not the challenge. If you want your program to engage in sophisticated feats of self-analysis and self-preservation—e.g. figuring out ways to prevent its mainframe from being switched off, asking itself whether a particular port to another platform would still preserve its identity, and so on—the hard part is not the self part. The hard part is to create a program that can reason about such topics at all, whether or not they apply to itself. If you can create an AI which could solve such problems (keeping the power on, protecting core identity) for another AI, you are more than 99% of the way to having an AI that can solve those problems for itself.
A recent comment about Descartes inspired this thought: the simplest possible utility function for an agent is one that only values survival of mind, as in “I think therefore I am”. This function also seems to be immune to the wireheading problem because it’s optimizing something directly perceivable by the agent, rather than some proxy indicator.
But when I started thinking about an AI with this utility function, I became very confused. How exactly do you express this concept of “me” in the code of a utility-maximizing agent? The problem sounds easy enough: it doesn’t refer to any mystical human qualities like “consciousness”, it’s purely a question about programming tricks, but still it looks quite impossible to solve. Any thoughts?
You want the program to keep running in the context of the world. To specify what that means, you need to build on top of an ontology that refers to the world. But figuring out such ontology is a very difficult problem and you can’t even in principle refer to the whole world as it really is: you’ll always have uncertainty left, even in a general ontological model.
The program will have to know what tradeoffs to make, for example whether it’s important to survive in most possible worlds with fair probability, or in at least one possible world with high probability. These would lead to very different behavior, and the possibility of such tradeoffs exemplifies how much data such preference would require. If additionally you want to keep most of the world as it would be if the AI was never created, that’s another complex counterfactual for you to bake in into its preference.
It’s a very difficult problem, probably more difficult that FAI, since for FAI we at least have some hope of cheating and copying formal preference from an existing blueprint, and here you have to build that from scratch, translating your requirements from human-speak to formal specification.
An agent’s “me” is its model of itself. This is already a fairly complicated thing for an agent to have, and it need not have one.
Why do you say that an agent can “directly perceive” its own mind? Or anything else? A perception is just a signal somewhere inside the agent: a voltage, a train of neural firings, or whatever. It can never be identical to the thing that caused it, the thing that it is a perception of. People can very easily have mistaken ideas of who they are.
Program must have something to preserve. My first thought is preservation of declarative memory: ensure that future contain chain of systems, implementing same goal, with overlapping declarative memory.
I haven’t made an analysis, just first thought.
It refers to mystical human qualities like “me” and “think”. Basically I put it in the exact same category as ‘consciousness’.
No it doesn’t. I’m not interested in replicating the inner experience of humans. I’m interested in something that can be easily noticed and tested from the outside: a program that chooses the actions that allow the program to keep running. It just looks like a trickier version of the quine problem, do you think that one’s impossible as well?
If you want this to work in the real world, not a just much simpler computational environment, then for starters: what counts as a “program” “running”? And what distinguishes “the” program from other possible programs? These seem likely to be in the same category as (not to mention subproblems of) consciousness, whatever that category is.
Right now I’d be content with an answer in some simple computational environment. Let’s solve the easy problem before attempting the hard one.
My observation is just that the process you’re going through here in taking the “I think therefore I am” and making it into the descriptive and testable system is similar to the process others may go through to find the simplest way to have a ‘conscious’ system. In fact many people would resolve ‘conscious’ to a very similar kind of system!
I do not think either are impossible to do once you make, shall we say, appropriate executive decisions regarding resolving the ambiguity in “me” or “conscious” into something useful. In fact, I think both are useful problems to look at.
It’s not hard to design a program with a model of the world that includes itself (though actually coding it requires more effort). The first step is to forget about self-modeling, and just ask, how can I model a world with programs? Then later on you put that model in a program, and then you add a few variables or data structures which represent properties of that program itself.
None of this solves problems about consciousness, objective referential meaning of data structures, and so on. But it’s not hard to design a program which will make choices according to a utility function which refers in turn to the program itself.
Well, I don’t want to solve the problem of consciousness right now. You seem to be thinking along correct lines, but I’d appreciate it if you gave a more fleshed out example—not necessarily working code, but an unambiguous spec would be nice.
Getting a program to represent aspects of itself is a well-studied topic. As for representing its relationship to a larger environment, two simple examples:
1) It would be easy to write a program whose “goal” is to always be the biggest memory hog. All it has to do is constantly run a background calculation of adjustable computational intensity, periodically consult its place in the rankings, and if it’s not number one, increase its demand on CPU resources.
2) Any nonplayer character in a game which fights to preserve itself is also engaged in a limited form of self-preservation. And the computational mechanisms for this example should be directly transposable to a physical situation, like robots in a gladiator arena.
All these examples work through indirect self-reference. The program or robot doesn’t know that it is representing itself. This is why I said that self-modeling is not the challenge. If you want your program to engage in sophisticated feats of self-analysis and self-preservation—e.g. figuring out ways to prevent its mainframe from being switched off, asking itself whether a particular port to another platform would still preserve its identity, and so on—the hard part is not the self part. The hard part is to create a program that can reason about such topics at all, whether or not they apply to itself. If you can create an AI which could solve such problems (keeping the power on, protecting core identity) for another AI, you are more than 99% of the way to having an AI that can solve those problems for itself.
This concept is extremely complex (for example, which “outside” are you talking about?).
You seem to be reading more than I intended into my original question. If the program is running in a simulated world, we’re on the outside.
Yes, using a formal world simplifies this a lot.