The orthogonality thesis will be false for AIs for the same reasons you rightly say it is false for humans.
We have the desire for certain things. How do we know we have those desires? Because we notice that when we feel a certain way (which we end up calling desires) we tend to do certain things. So we call those feelings the desire to do those things. If we tended to do other things, but felt the same way, we would call those feelings desires for the other things, instead of the first things.
In the same way, AIs will tend to do certain things, and many of those tendencies will be completely independent of some arbitrary goal. For example, let’s suppose there is a supposed paperclipper. It is a physical object, and its parts will have tendencies. Consequently it will have tendencies as a whole, and many of them will be unrelated to paperclips, depending on many factors such as what it is made of and how the parts are put together. When it has certain feelings (presumably) and ends up tending to do certain things, e.g. suppose it tends to think a long time about certain questions, it will say that those feelings are the desire for those things. So it will believe that it has a desire to think a long time about certain topics, even if that is unrelated to paperclips.
In case of AIs some orthogonality is possible if its goal system is preserved in the separate text-block, but only until some extent.
If he is sophisticated enough he would ask: “What is paperclip? Why it is in my value box?” And such ability to reflection (which is needed for self-improving AI) will blur the distance between intelligence and its values.
Orthogonality is also under question if the context change. If meaning of words and world model change, values need to be updated. Context is inside AI, not in his value box.
The separate text-block can illustrate what I am saying. You have an AI, made of two parts, A & B. Part B contains the value box which says, “paperclips are the only important thing.” But there is also part A, which is a physical thing, and since it is a physical thing, it will have certain tendencies. Since the paperclippiness is only in part B, those tendencies will be independent of paperclips. When it feels those tendencies, it will feel desires that have nothing to do with paperclips.
“They could still operate in harmony...” Those tendencies were there before anyone ever thought of paperclips, so there isn’t much chance that all of them would work out just in the way that would happen to promote paperclips.
Of course, everything is a physical object. What I’m curious about your position is if you think that you can put any algorithm inside a piece of hardware, or not. I’m afraid that your position on the matter is so out there for me that without a toy model I wouldn’t be able to understand what you mean. The recursive nature of the comments doesn’t help, also.
You can put any program you want into a physical object. But since it is a physical object, it will do other things in addition to executing the algorithm.
I gave the example of following gravity, and in general it is following all of the laws of physics, e.g. by resisting the pressure of other things in contact with it, and so on. Of course, the laws of physics are also responsible for it executing the program. But that doesn’t mean the laws of physics do nothing at all except execute the program—evidently they do plenty of other things as well. And you are not in control of those things and cannot program them. So they will not all work out to promote paperclips, and the thing will always feel desires that have nothing to do with paperclips.
I don’t think that “tendencies” is right wording here. Like a calculator has a keyboard and a processor. The keyboard provides digits for multiplication, but the processor doesn’t have any own tendencies.
The processor has tendencies. It is subject to the law of gravity and many other physical tendencies. That is why I mentioned the fact that the parts of an AI are physical. They are bodies, and have many bodily tendencies, no matter what algorithms are programmed into them.
But that doesn’t just reduce to a will of survival? I know that extracting certain salts from my blood is essential to my survival, so I want that my parts that do exactly that continue to do so. But I do not have any specific attachment to that functions just because a sub-part of me executes this. If I were in a simulation, say, even if I knew that my simulated kidneys worked in the same way I know I could continue to exist even without that function. From the wording of your previous comments, it seemed that an AI conscious of its parts should have isomorphic desires, but the problem is that there could be many different isomorphisms, some of which are ridiculous.
We do in fact feel many desires like that, e.g. the desire to remove certain unneeded materials from our bodies, and other such things. The reason you don’t have a specific desire to extract ammonia is that you don’t have a specific feeling for that; if you did have a specific feeling, it would be a desire specifically to extract ammonia, just like you specifically desire the actions I mentioned in the first part of this comment, and just as you can have the specific desire for sex.
I feel we are talking past each other, because reading the comment above I’m totally confused about what question you’re answering...
Let me rephrase my question: if I substitute one of the parts of an AI with an inequivalent part, say a kidney with a solar cell, will its desires change or not?
Let me respond with another question: if I substitute one of the parts of a human being an inequivalent part, say the nutritional system so that it lives on rocks instead of food, will the human’s desires change or not?
Yes, they will, because they will desire to eat rocks instead of what they were eating before.
The orthogonality thesis will be false for AIs for the same reasons you rightly say it is false for humans.
We have the desire for certain things. How do we know we have those desires? Because we notice that when we feel a certain way (which we end up calling desires) we tend to do certain things. So we call those feelings the desire to do those things. If we tended to do other things, but felt the same way, we would call those feelings desires for the other things, instead of the first things.
In the same way, AIs will tend to do certain things, and many of those tendencies will be completely independent of some arbitrary goal. For example, let’s suppose there is a supposed paperclipper. It is a physical object, and its parts will have tendencies. Consequently it will have tendencies as a whole, and many of them will be unrelated to paperclips, depending on many factors such as what it is made of and how the parts are put together. When it has certain feelings (presumably) and ends up tending to do certain things, e.g. suppose it tends to think a long time about certain questions, it will say that those feelings are the desire for those things. So it will believe that it has a desire to think a long time about certain topics, even if that is unrelated to paperclips.
In case of AIs some orthogonality is possible if its goal system is preserved in the separate text-block, but only until some extent.
If he is sophisticated enough he would ask: “What is paperclip? Why it is in my value box?” And such ability to reflection (which is needed for self-improving AI) will blur the distance between intelligence and its values.
Orthogonality is also under question if the context change. If meaning of words and world model change, values need to be updated. Context is inside AI, not in his value box.
The separate text-block can illustrate what I am saying. You have an AI, made of two parts, A & B. Part B contains the value box which says, “paperclips are the only important thing.” But there is also part A, which is a physical thing, and since it is a physical thing, it will have certain tendencies. Since the paperclippiness is only in part B, those tendencies will be independent of paperclips. When it feels those tendencies, it will feel desires that have nothing to do with paperclips.
Maybe, but they could still operate in harmony to reduce the world to a giant paperclips.
“They could still operate in harmony...” Those tendencies were there before anyone ever thought of paperclips, so there isn’t much chance that all of them would work out just in the way that would happen to promote paperclips.
Are we still talking about an AI that can be programmed at will?
I am pointing out that you cannot have an AI without parts that you did not program. An AI is not an algorithm. It is a physical object.
Of course, everything is a physical object. What I’m curious about your position is if you think that you can put any algorithm inside a piece of hardware, or not.
I’m afraid that your position on the matter is so out there for me that without a toy model I wouldn’t be able to understand what you mean. The recursive nature of the comments doesn’t help, also.
You can put any program you want into a physical object. But since it is a physical object, it will do other things in addition to executing the algorithm.
Well, now you got me curious. What other things a processor is doing when executing a program?
I gave the example of following gravity, and in general it is following all of the laws of physics, e.g. by resisting the pressure of other things in contact with it, and so on. Of course, the laws of physics are also responsible for it executing the program. But that doesn’t mean the laws of physics do nothing at all except execute the program—evidently they do plenty of other things as well. And you are not in control of those things and cannot program them. So they will not all work out to promote paperclips, and the thing will always feel desires that have nothing to do with paperclips.
I don’t think that “tendencies” is right wording here. Like a calculator has a keyboard and a processor. The keyboard provides digits for multiplication, but the processor doesn’t have any own tendencies.
But it still could define context
The processor has tendencies. It is subject to the law of gravity and many other physical tendencies. That is why I mentioned the fact that the parts of an AI are physical. They are bodies, and have many bodily tendencies, no matter what algorithms are programmed into them.
This is akin to saying that since your kidneys work by extracting ammonia from your blood, you have some amount of desire to drink ammonia.
No. It is akin to saying that if you felt the work of your kidneys, you would call that a desire to extract ammonia from your blood. And you would.
But that doesn’t just reduce to a will of survival? I know that extracting certain salts from my blood is essential to my survival, so I want that my parts that do exactly that continue to do so. But I do not have any specific attachment to that functions just because a sub-part of me executes this. If I were in a simulation, say, even if I knew that my simulated kidneys worked in the same way I know I could continue to exist even without that function.
From the wording of your previous comments, it seemed that an AI conscious of its parts should have isomorphic desires, but the problem is that there could be many different isomorphisms, some of which are ridiculous.
We do in fact feel many desires like that, e.g. the desire to remove certain unneeded materials from our bodies, and other such things. The reason you don’t have a specific desire to extract ammonia is that you don’t have a specific feeling for that; if you did have a specific feeling, it would be a desire specifically to extract ammonia, just like you specifically desire the actions I mentioned in the first part of this comment, and just as you can have the specific desire for sex.
I feel we are talking past each other, because reading the comment above I’m totally confused about what question you’re answering...
Let me rephrase my question: if I substitute one of the parts of an AI with an inequivalent part, say a kidney with a solar cell, will its desires change or not?
Let me respond with another question: if I substitute one of the parts of a human being an inequivalent part, say the nutritional system so that it lives on rocks instead of food, will the human’s desires change or not?
Yes, they will, because they will desire to eat rocks instead of what they were eating before.
The same with the AI.