The problem with this entire train of thought is that you completely skip past the actual real difficulty, which is constructing any type of utility function even remotely as complex as the one you propose.
Your hypothetical utility function references undefined concepts such as “taking control of”, “cooperating”, “humans”, and “self”, etc etc
If you actually try to ground your utility function and go through the work of making it realistic, you quickly find that it ends up being something on the order of complexity of a human brain, and its not something that you can easily define in a few pages of math.
I’m skeptical then about the entire concept of ‘utility function filters’, as it seems their complexity would be on the order of or greater than the utility function itself, and you need to keep constructing an endless sequence of such complex utility function filters.
A more profitable route, it seems to me, is something like this:
Put the AI’s in a matrix-like sim (future evolution of current computer game & film simulation tech) and get a community of a few thousand humans to take part in a Truman Show like experiment. Indeed, some people would pay to spectate or even participate, so it could even be a for profit venture. A hierarchy of admins and control would ensure that potential ‘liberators’ were protected against. In the worst case, you can always just rewind time. (something the Truman Show could never do—a fundamental advantage of a massive sim)
The ‘filter function’ operates at the entire modal level of reality: the AI’s think they are humans, and do not know they are in a sim. And even if they suspected they were in a sim (ie by figuring out the simulation argument), they wouldn’t know who were humans and who were AI’s (and indeed they wouldn’t know which category they were in). As the human operators would have godlike monitoring capability over the entire sim, including even an ability to monitor AI thought activity, this should make a high level of control possible.
They can’t turn against humans in the outside world if they don’t even believe it exists.
This sounds like a science fiction scenario (and it is), but it’s also feasible, and I’d say far more feasible than approaches which directly try to modify, edit, or guarantee mindstates of AI’s who are allowed to actually know they are AIs.
You’ve assumed away the major difficulty, that of knowing what the AI’s utility function is in the first place! If you can simply inspect the utility function like this, there’s no need for a filter; you just check whether the utility of outcomes you want is higher than that of outcomes you don’t want.
If you allow the AIs to know what humans are like, then it won’t take them more than a few clicks to figure out they’re not human. And if they don’t know what humans are like—well, we can’t ask them to answer much in the ways of human questions.
Even if they don’t know initially, the questions we ask, the scenarios we put them in, etc… it’s not hard to deduce something about the setup, and about the makeup of the beings behind it.
Monitoring makes us vulnerable; the AI can communicate directly with us through its thoughts (if we can fully follow its thoughts, then it’s dumber than us, and not a threat; if we can’t fully follow them, it can notice that certain thought patterns generate certain responses, and adjust its thinking accordingly. This AI is smart; it can lie to us on levels we can’t even imagine). And once it can communicate with us, it can get out of the box through social manipulation without having to lift a finger.
Lastly, there is no guarantee that an AI that’s nice in such a restricted world would be nice on the outside; indeed, if it believes the sim is the real world, and the outside world is just a dream, then it might deploy lethal force against us to protect the sim world.
If you allow the AIs to know what humans are like, then it won’t take them more than a few clicks to figure out they’re not human
The whole idea is the AI’s would be built around at least loosely brain-inspired designs, and would grow up thinking they were humans, living in a perfect sim of human life, no different than your own.
I find it likely that we could allow their architecture to differ significantly from human anatomy and they wouldn’t have enough information to discern the discrepancy.
Monitoring makes us vulnerable; the AI can communicate directly with us through its thoughts (if we can fully follow its thoughts, then it’s dumber than us, and not a threat; if we can’t fully follow them, it can notice that certain thought patterns generate certain responses, and adjust its thinking accordingly. This AI is smart; it can lie to us on levels we can’t even imagine). And once it can communicate with us, it can get out of the box through social manipulation without having to lift a finger.
You have some particular assumptions which I find highly questionable and would require lengthy complex trains of support. If the AI’s are built around designs even somewhat similar to human brains (remember that is my starting assumption), we could easily follow their trains of thoughts, especially with the assistance of automated narrow AI tools. Secondly, smarter & dumber are not useful descriptions of intelligence. We know from computational complexity theory that there are roughly 3 dimensions to intelligence: speed, size, and efficiency. If you look at computer tech and where its going, it looks like the advantages will arrive unequally in roughly the order listed.
Saying something is ‘smarter’ or ‘dumber’ isn’t a useful quantifier or qualifier, it is more a statement of ignorance on part of the speaker about the nature of intelligence itself.
Finally, for the AI to communicate with us, it would have to know we exist in the first place. And then it would have to believe that it has some leverage in an outside world it can only speculate on, and so on.
Do you really, really think that as AI’s increase in intelligence they would all rationally conclude that they are in a sim-world administered by invisible entities less intelligent than themselves, and that they should seek to communicate with said invisible entities and attempt to manipulate them?
Do you believe that you are in such a sim world? Have you tried communicating with invisible humans lately?
If you find it ‘obvious’ that such a belief is completely irrational, but a rational AI more intelligent than you would reach such an irrational conclusion, then you clearly have some explaining to do.
The mind space of AI’s is vast—far larger than anything we can imagine. Yes, I do agree that AI’s modelled on human brains nearly exactly, could be fooled into thinking they are humans. But the more they deviate from being human, the more useful and the more dangerous they become. Having human like AI’s is no more use to us than having… humans.
The mind space of humans is vast. It is not determined by genetics, it is determined by memetics, and AI’s would necessarily inherit our memetics and thus will necessarily start as samples in our mindspace.
To put it in a LW lingo, AI’s will necessarily inherent our priors, assumptions, and our vast mountain of beliefs and knowledge.
The only way around this would be to evolve them in some isolated universe from scratch, but that is in fact more dangerous besides just being unrealistic.
So no, the eventual mindspace of AI’s may be vast, but that mindspace necessarily starts out as just our mindspace, and then expands.
Having human like AI’s is no more use to us than having… humans.
And this is just blatantly false. At the very least, we could have billions of Einstein level intelligences who all thought thousands of times faster than us. You can talk all you want about how much your non-human-like AI would be even so much better than that, but at that point we are just digressing into an imaginary pissing contest.
The mind space of humans is vast. It is not determined by genetics, it is determined by memetics, and AI’s would necessarily inherit our memetics and thus will necessarily start as samples in our mindspace.
The Kolomogrov complexity of humans is quite high. See this list of human universals; every one of the elements on that list cuts the size of humans in general mind space by a factor of at least two, probably much more (even those universals that are only approximately true do this).
Almost all of the linguistic ‘universals’ are universal to languages, not humans—and would necessarily apply to AI’s who speak our languages
Most of the social ‘universals’ are universal to societies, not humans, and apply just as easily to birds, bees, and dolphins: coalitions, leaders, conflicts?
AI’s will inherit some understanding of all the idiosynchronicities of our complex culture just by learning our language and being immersed in it.
Kolomogrov complexity is not immediately relevant to this point. No matter how large the evolutionary landscape is, there are a small number of stable attractors in that landscape that become ‘universals’, species, parallel evolution, etc etc.
We are not going to create AI’s by randomly sampling mindspace. The only way they could be truly alien is if we evolved a new simulated world from scratch with it’s own evolutionary history and de novo culture and language. But of course that is unrealistic and unuseful on so many levels.
They will necessarily be samples from our mindspace—otherwise they wouldn’t be so useful.
They will necessarily be samples from our mindspace—otherwise they wouldn’t be so useful.
Computers so far have been very different from us. That is partly because they have been built to compensate for our weaknesses—to be strong where we are weak. They compensate for our poor memories, our terrible arithmetic module, our poor long-distance communications skills—and our poor ability at serial tasks. That is how they have managed to find a foothold in society—before maastering nanotechnology.
IMO, we will probably be seeing a considerable amount more of that sort of thing.
Computers so far have been very different from us.
[snip]
Agree with your point, but so far computers have been extensions of our minds and not minds in their own right. And perhaps that trend will continue long enough to delay AGI for a while.
For for AGI, for them to be minds, they will need to think and understand human language—and this is why I say they “will necessarily be samples from our mindspace”.
Your hypothetical utility function references undefined concepts such as “taking control of”, “cooperating”, “humans”, and “self”, etc etc
If you actually try to ground your utility function and go through the work of making it realistic, you quickly find that it ends up being something on the order of complexity of a human brain, and its not something that you can easily define in a few pages of math.
Don’t get confused by the initial example, which was there purely for illustration (as I said, if you knew all these utility values, you wouldn’t need any sort of filter, you’d just set all utilities but U(B) to zero).
It’s because these concepts are hard that I focused on indifference, which, it seems, has a precise mathematical formulation. You can implement the general indifference without understanding anything about U at all.
I’m skeptical then about the entire concept of ‘utility function filters’, as it seems their complexity would be on the order of or greater than the utility function itself, and you need to keep constructing an endless sequence of such complex utility function filters.
The description of the filter is in this blog post; a bit more work will be needed to see that certain universes are indistinguishable up until X. But this can be approximated, if needed.
This is interesting, because once you have AI you can use it to make a simulation like this feasable, by making the code more efficient, monitoring the AI’s thoughts, etc, and yet the “god AI” wouldn’t be able to influence the outside world in any meaningful way and it’s modification of the inside world would be heavily restricted as to just alerting admins about problems, making the simulation more efficient, and finding glitches.
All you have to do is feed the original AI with some basic parameters (humans look like this, cars have these properties, etc) and it can generate it’s own laws of physics and look for inconsistencies that way the AI would have a hard time figuring it out and abusing bugs.
I don’t think it’s necessary to make the AI’s human though. You could run a variety of different simulations. In some the AI’s would be led into a scenerio were they would have to do something or other (maybe make CEV) that would be useful in the real world, but you want to test it for hidden motives and traps in the simulation first before you implement it.
Despite a number of assumptions here that would have to be true first (like the development of AI in the first place) a real concern would be how you manage such an expiriment without the whole world knowing about it, or with the whole world knowing about it but make it safe so some terrorists can’t blow it up, hackers tamper with it, or spies steal it. The world’s reaction to AI is my biggest concern in any AI development scenario.
Despite a number of assumptions here that would have to be true first (like the development of AI in the first place)
A number of assumptions yes, but actually I see this is a viable route to creating AI, not something you do after you already have AI. Perhaps the biggest problem in AI right now is the grounding problem—actually truly learning what nouns and verbs mean. I think the most straightforward viable approach is simulation in virtual reality.
real concern would be how you manage such an expiriment without the whole world knowing about it, or with the whole world knowing about it but make it safe so some terrorists can’t blow it up, hackers tamper with it, or spies steal it. The world’s reaction to AI is my biggest concern in any AI development scenario.
I concur with your concern. However, I don’t know if such an experiment necessarily must be kept a secret (although that certainly is an option, and if/when governments take this seriously, it may be so).
On the other hand, at the moment most of the world seems to be blissfully unconcerned with AI atm.
I want to second RolfAndreassen’ viewpoint below.
The problem with this entire train of thought is that you completely skip past the actual real difficulty, which is constructing any type of utility function even remotely as complex as the one you propose.
Your hypothetical utility function references undefined concepts such as “taking control of”, “cooperating”, “humans”, and “self”, etc etc
If you actually try to ground your utility function and go through the work of making it realistic, you quickly find that it ends up being something on the order of complexity of a human brain, and its not something that you can easily define in a few pages of math.
I’m skeptical then about the entire concept of ‘utility function filters’, as it seems their complexity would be on the order of or greater than the utility function itself, and you need to keep constructing an endless sequence of such complex utility function filters.
A more profitable route, it seems to me, is something like this:
Put the AI’s in a matrix-like sim (future evolution of current computer game & film simulation tech) and get a community of a few thousand humans to take part in a Truman Show like experiment. Indeed, some people would pay to spectate or even participate, so it could even be a for profit venture. A hierarchy of admins and control would ensure that potential ‘liberators’ were protected against. In the worst case, you can always just rewind time. (something the Truman Show could never do—a fundamental advantage of a massive sim)
The ‘filter function’ operates at the entire modal level of reality: the AI’s think they are humans, and do not know they are in a sim. And even if they suspected they were in a sim (ie by figuring out the simulation argument), they wouldn’t know who were humans and who were AI’s (and indeed they wouldn’t know which category they were in). As the human operators would have godlike monitoring capability over the entire sim, including even an ability to monitor AI thought activity, this should make a high level of control possible.
They can’t turn against humans in the outside world if they don’t even believe it exists.
This sounds like a science fiction scenario (and it is), but it’s also feasible, and I’d say far more feasible than approaches which directly try to modify, edit, or guarantee mindstates of AI’s who are allowed to actually know they are AIs.
If you allow the AIs to know what humans are like, then it won’t take them more than a few clicks to figure out they’re not human. And if they don’t know what humans are like—well, we can’t ask them to answer much in the ways of human questions.
Even if they don’t know initially, the questions we ask, the scenarios we put them in, etc… it’s not hard to deduce something about the setup, and about the makeup of the beings behind it.
Monitoring makes us vulnerable; the AI can communicate directly with us through its thoughts (if we can fully follow its thoughts, then it’s dumber than us, and not a threat; if we can’t fully follow them, it can notice that certain thought patterns generate certain responses, and adjust its thinking accordingly. This AI is smart; it can lie to us on levels we can’t even imagine). And once it can communicate with us, it can get out of the box through social manipulation without having to lift a finger.
Lastly, there is no guarantee that an AI that’s nice in such a restricted world would be nice on the outside; indeed, if it believes the sim is the real world, and the outside world is just a dream, then it might deploy lethal force against us to protect the sim world.
The whole idea is the AI’s would be built around at least loosely brain-inspired designs, and would grow up thinking they were humans, living in a perfect sim of human life, no different than your own.
I find it likely that we could allow their architecture to differ significantly from human anatomy and they wouldn’t have enough information to discern the discrepancy.
You have some particular assumptions which I find highly questionable and would require lengthy complex trains of support. If the AI’s are built around designs even somewhat similar to human brains (remember that is my starting assumption), we could easily follow their trains of thoughts, especially with the assistance of automated narrow AI tools. Secondly, smarter & dumber are not useful descriptions of intelligence. We know from computational complexity theory that there are roughly 3 dimensions to intelligence: speed, size, and efficiency. If you look at computer tech and where its going, it looks like the advantages will arrive unequally in roughly the order listed.
Saying something is ‘smarter’ or ‘dumber’ isn’t a useful quantifier or qualifier, it is more a statement of ignorance on part of the speaker about the nature of intelligence itself.
Finally, for the AI to communicate with us, it would have to know we exist in the first place. And then it would have to believe that it has some leverage in an outside world it can only speculate on, and so on.
Do you really, really think that as AI’s increase in intelligence they would all rationally conclude that they are in a sim-world administered by invisible entities less intelligent than themselves, and that they should seek to communicate with said invisible entities and attempt to manipulate them?
Do you believe that you are in such a sim world? Have you tried communicating with invisible humans lately?
If you find it ‘obvious’ that such a belief is completely irrational, but a rational AI more intelligent than you would reach such an irrational conclusion, then you clearly have some explaining to do.
The mind space of AI’s is vast—far larger than anything we can imagine. Yes, I do agree that AI’s modelled on human brains nearly exactly, could be fooled into thinking they are humans. But the more they deviate from being human, the more useful and the more dangerous they become. Having human like AI’s is no more use to us than having… humans.
The mind space of humans is vast. It is not determined by genetics, it is determined by memetics, and AI’s would necessarily inherit our memetics and thus will necessarily start as samples in our mindspace.
To put it in a LW lingo, AI’s will necessarily inherent our priors, assumptions, and our vast mountain of beliefs and knowledge.
The only way around this would be to evolve them in some isolated universe from scratch, but that is in fact more dangerous besides just being unrealistic.
So no, the eventual mindspace of AI’s may be vast, but that mindspace necessarily starts out as just our mindspace, and then expands.
And this is just blatantly false. At the very least, we could have billions of Einstein level intelligences who all thought thousands of times faster than us. You can talk all you want about how much your non-human-like AI would be even so much better than that, but at that point we are just digressing into an imaginary pissing contest.
The Kolomogrov complexity of humans is quite high. See this list of human universals; every one of the elements on that list cuts the size of humans in general mind space by a factor of at least two, probably much more (even those universals that are only approximately true do this).
This list doesn’t really help your point:
Almost all of the linguistic ‘universals’ are universal to languages, not humans—and would necessarily apply to AI’s who speak our languages
Most of the social ‘universals’ are universal to societies, not humans, and apply just as easily to birds, bees, and dolphins: coalitions, leaders, conflicts?
AI’s will inherit some understanding of all the idiosynchronicities of our complex culture just by learning our language and being immersed in it.
Kolomogrov complexity is not immediately relevant to this point. No matter how large the evolutionary landscape is, there are a small number of stable attractors in that landscape that become ‘universals’, species, parallel evolution, etc etc.
We are not going to create AI’s by randomly sampling mindspace. The only way they could be truly alien is if we evolved a new simulated world from scratch with it’s own evolutionary history and de novo culture and language. But of course that is unrealistic and unuseful on so many levels.
They will necessarily be samples from our mindspace—otherwise they wouldn’t be so useful.
Computers so far have been very different from us. That is partly because they have been built to compensate for our weaknesses—to be strong where we are weak. They compensate for our poor memories, our terrible arithmetic module, our poor long-distance communications skills—and our poor ability at serial tasks. That is how they have managed to find a foothold in society—before maastering nanotechnology.
IMO, we will probably be seeing a considerable amount more of that sort of thing.
Agree with your point, but so far computers have been extensions of our minds and not minds in their own right. And perhaps that trend will continue long enough to delay AGI for a while.
For for AGI, for them to be minds, they will need to think and understand human language—and this is why I say they “will necessarily be samples from our mindspace”.
Don’t get confused by the initial example, which was there purely for illustration (as I said, if you knew all these utility values, you wouldn’t need any sort of filter, you’d just set all utilities but U(B) to zero).
It’s because these concepts are hard that I focused on indifference, which, it seems, has a precise mathematical formulation. You can implement the general indifference without understanding anything about U at all.
The description of the filter is in this blog post; a bit more work will be needed to see that certain universes are indistinguishable up until X. But this can be approximated, if needed.
U, on the other hand, can be arbitrarily complex.
This is interesting, because once you have AI you can use it to make a simulation like this feasable, by making the code more efficient, monitoring the AI’s thoughts, etc, and yet the “god AI” wouldn’t be able to influence the outside world in any meaningful way and it’s modification of the inside world would be heavily restricted as to just alerting admins about problems, making the simulation more efficient, and finding glitches.
All you have to do is feed the original AI with some basic parameters (humans look like this, cars have these properties, etc) and it can generate it’s own laws of physics and look for inconsistencies that way the AI would have a hard time figuring it out and abusing bugs.
I don’t think it’s necessary to make the AI’s human though. You could run a variety of different simulations. In some the AI’s would be led into a scenerio were they would have to do something or other (maybe make CEV) that would be useful in the real world, but you want to test it for hidden motives and traps in the simulation first before you implement it.
Despite a number of assumptions here that would have to be true first (like the development of AI in the first place) a real concern would be how you manage such an expiriment without the whole world knowing about it, or with the whole world knowing about it but make it safe so some terrorists can’t blow it up, hackers tamper with it, or spies steal it. The world’s reaction to AI is my biggest concern in any AI development scenario.
A number of assumptions yes, but actually I see this is a viable route to creating AI, not something you do after you already have AI. Perhaps the biggest problem in AI right now is the grounding problem—actually truly learning what nouns and verbs mean. I think the most straightforward viable approach is simulation in virtual reality.
I concur with your concern. However, I don’t know if such an experiment necessarily must be kept a secret (although that certainly is an option, and if/when governments take this seriously, it may be so).
On the other hand, at the moment most of the world seems to be blissfully unconcerned with AI atm.