I think I see what you’re saying; just as we reflect on our desires and try to understand how they tick and where, biologically and historically and culturally, they come from, so also might any AI.
However, the thing about it is: that doesn’t actually change those values. For example (and despite the dire warnings of some creationists), despite the fact that we now understand that our value system is a consequence of an evolutionary algorithm, we haven’t actually started valuing evolutionary goals over our own built-in goals. For example, contraception is popular even though it’s quite silly from the perspective of gene propogation.
Similarly, a paperclip-maximizer might well be interested in figuring out why its utility function is what it is, so that it may better understand the world it lives in… but that’s not going to change its overriding and primary interest in making paperclips.
For example, contraception is popular even though it’s quite silly from the perspective of gene propogation.
It seems as though that sometimes triggers intimate pair-bonding activities while reducing your exposure to STDs. Use of condoms is often not remotely silly from that perspective—IMHO.
The example still works since there are quite a few couples who use condoms because they just don’t want to have kids. They don’t have any worry about STDs from their partner. If you insist on a clear cut case look at men who get vasectomies.
The idea that use of contraception is “silly” from the perspective of gene propagation seem just wrong to me. There are plenty of cases where it would make sense for those who want to spread their genes around to agree to use contraceptives. Contraceptive use makes sense sometimes, and not others.
It could be claimed that the average effect of contraception on genes is negative—but that seems to be a whole different thesis.
Sure. Surely we are not disagreeing here. The original comment was:
For example, contraception is popular even though it’s quite silly from the perspective of gene propogation.
My position is just that contraception has a perfectly reasonably place for gene propogators. The idea that contraception is always opposed to your genetic interests is wrong. Lack of contraception can easily result in things like this—which really doesn’t help. That using contraception is “silly” from a genetic perspective is a popular myth.
I’m not sure if we are. The fact that contraception might have a reasonable place for gene propagators is not the issue. The point is that much, and possibly the vast majority, of contraceptive use is contrary to the goals of gene propagation.
No contraception can easily result in things like this—which really doesn’t help. That using contraception is “silly” from a genetic perspective is a popular myth.
Not really. Remember, evolution doesn’t care about your happiness. Indeed, regarding the example you linked to, from an evolutionary perspective,a one night stand with all the protection is utterly useless. It is very likely in that male’s evolutionary advantage to not use condoms.
And even if you don’t agree with the condom example the other example, of a people engaging in a generally irreversible or difficult to reverse operation which renders them close to sterile is pretty clearly against the interest of gene propagation.
Humans evolved in a context where we didn’t have easy contraception and the best humans could do to prevent contraception was things like coitus interruptus. It shouldn’t surprise you that evolution has not made human instincts catch up with modern technologies.
One might think that from an evolutionary perspective it makes sense to substantially delay or reduce offspring number so as to invest maximum resources in a small number of offspring. But humans in the developed world now reside in a situation with low disease rates and lots of resources, so that strategy is sub-optimal from an evolutionary perspective. Look at how charedi(ultra-orthodox) Jews and the Amish are two of the fastest growing populations in the United States.
The fact that contraception might have a reasonable place for gene propagators is not the issue. The point is that much, and possibly the vast majority, of contraceptive use is contrary to the goals of gene propagation.
I can see what you think the issue is. What I don’t see is where in the context you are getting that impression from.
No contraception can easily result in things like this—which really doesn’t help. That using contraception is “silly” from a genetic perspective is a popular myth.
Not really. Remember, evolution doesn’t care about your happiness. Indeed, regarding the example you linked to, from an evolutionary perspective,a one night stand with all the protection is utterly useless. It is very likely in that male’s evolutionary advantage to not use condoms.
Your example is stacked to favour your conclusion. What you need to try and do in order to understand my position is to think about an example that favours my conclusion.
So: get rid of the one-night stand, and imagine that the girl is desirable—that having safe sex with her looks like the best way to initiate a pair-bonding process leading to the two of you having some babies together—and that the alternative is rejection, and her walking off and telling her friends what a jerk you are when it comes to protecting your girl.
For example, contraception is popular even though it’s quite silly from the perspective of gene propogation.
In the modern context, if you impregnate someone without planning it out properly, there’s a non-negligible chance they’ll get an abortion, which is even worse for gene propagation. Furthermore, parents are to some extent legally responsible for their children’s actions, so having too many poorly-regulated kids running around means exposing yourself to liability. A big part of the optimal strategy for present-day long-term reproductive success is to get rich, and a big part of getting rich is not having more kids than you can keep track of.
In the modern context, if you impregnate someone without planning it out properly, there’s a non-negligible chance they’ll get an abortion, which is even worse for gene propagation. Furthermore, parents are to some extent legally responsible for their children’s actions, so having too many poorly-regulated kids running around means exposing yourself to liability. A big part of the optimal strategy for present-day long-term reproductive success is to get rich, and a big part of getting rich is not having more kids than you can keep track of.
I think that’s a retcon. People use contraception so they can have more sex than they would if they had to worry about having kids every time. They may or may not rationalise further, I suspect that generally they don’t.
A big part of the optimal strategy for present-day long-term reproductive success is to get rich, and a big part of getting rich is not having more kids than you can keep track of.
In terms of genetic success, having more kids than you can keep track of is pretty much the ideal, as long as all or at least most survive to reproductive adulthood.
In the modern context, if you impregnate someone without planning it out properly, there’s a non-negligible chance they’ll get an abortion, which is even worse for gene propagation. Furthermore, parents are to some extent legally responsible for their children’s actions, so having too many poorly-regulated kids running around means exposing yourself to liability. A big part of the optimal strategy for present-day long-term reproductive success is to get rich, and a big part of getting rich is not having more kids than you can keep track of.
But some people consciously choose never to have any kids. That’s silly from the perspective of gene propagation if anything is.
I think I see what you’re saying; just as we reflect on our desires and try to understand how they tick and where, biologically and historically and culturally, they come from, so also might any AI.
However, the thing about it is: that doesn’t actually change those values.
Sure it does. A devout priest spends half his life celibate and serving God. One day he has a crisis, reads a bunch of stuff on the internet and suddenly realizes he doesn’t believe in God. His values change.
despite the fact that we now understand that our value system is a consequence of an evolutionary algorithm, we haven’t actually started valuing evolutionary goals over our own built-in goals.
Even this is questionable. I suspect any concept of universal morality must be evolutionary. This certainly is a widespread concept in systems/transhumanist/singularitan/cosmist thought. We do value evolution in and of itself.
but that’s not going to change its overriding and primary interest in making paperclips.
It’s probably possible in principle to build such an AI—it would probably need some sort of immutable hard-coded paperclip recognition module which it could evaluate potential simulated futures generated from the more complex general intelligence system.
If such a thing developed to a human level or beyond and could reflect on it’s cognition, it may explain in lucid detail how futures filled with paperclips were good and others were evil.
It could even understand that it’s concepts of morality and good/bad were radically different than those of humans, and it would even understand that this difference relates to it’s hard coded paper-clip recognizer, and it would explain in detail how this architecture was superior to human value systems .. because it helped to maximize expected future paperclips.
It could even write books such as “Paperclip Morality: the Truth”.
But just because such a thing is possible in principle doesn’t make it the slightest bit likely.
If you can build an AGI that can understand human language, it would be much easier and considerably more effective to make the AGI’s goal system dynamically modifiable on reflection through human language.
Instead of having a special hard-coded circuit to evaluate the utility of potential futures, you could just have the general conceptual circuitry handle this. The concept of ‘good’ would still be somewhat special in it’s role in the goal system itself, but the ‘goodness recognizer’ could change and evolve over time.
Sure it does. A devout priest spends half his life celibate and serving God. One day he has a crisis, reads a bunch of stuff on the internet and suddenly realizes he doesn’t believe in God. His values change.
Well, the counter-argument to that particular example would be that the priest’s belief in God wasn’t a terminal value; rather, their goals of being happy and helping other people and understanding the universe were. Believing in and obeying God were just instrumental values.
However, agreed that there’s nothing in particular forcing people, weird and funky and clunky as our minds are, from always having the same fixed terminal values either. To pick an extreme example, peoples’ brains can sometimes be messed up severely by hormonal imbalances, which can in turn cause people to do such drastically anti-own-terminal-value things as committing suicide.
I should’ve been more specific and just said that, in general, understanding evolutionary psychology never or only very rarely causes peoples’ terminal values to change.
I suspect any concept of universal morality must be evolutionary. [...] We do value evolution in and of itself.
Human morality is a product of evolution; however, our morality is not itself an evolutionary algorithm execution mechanism. It’s kind of a vague approximation of one (in that all the moralities that sucked for fitness were selected against), but it still often leads to drastically different results than a straight-up evolutionary fitness maximization algorithm with access to our brains’ resources would.
For example: I intend never to have biological children and consider this decision to be a moral one. However, from an evolutionary perspective, deliberately preventing my own genes from propagating is just plain silly.
But just because such a thing is possible in principle doesn’t make it the slightest bit likely.
Yes, the paper-clip maximizer is just a whimsical example. However, similarly Really Unfriendly optimizers are quite plausible. Imagine the horrors that could result from a naive human-happiness-maximizer hitting the singularity asymptote.
If you can build an AGI that can understand human language, it would be much easier and considerably more effective to make the AGI’s goal system dynamically modifiable on reflection through human language.
Yes, that would be important, but it still wouldn’t be enough to solve the problem; in fact, the really hard part of the problem still remains! The happiness-maximizer might base its understanding of happiness on descriptive human usage of the word, and end up with a truly thorough and consistent understanding of the word… and then still turn everybody into nearly mindless wireheads.
Our morality engines and our language aren’t properly tuned for dealing with the kind of reality-bending power a superintelligent entity would have.
I should’ve been more specific and just said that, in general, understanding evolutionary psychology never or only very rarely causes peoples’ terminal values to change.
You may not have liked that particular example, but I think you are in agreement that terminal values change.
Just to make sure though, a few more examples:
someone who likes chocolate ice cream and then some years later prefers vanilla instead
someone who likes impressionist art but then years later prefers post-modern
someone who likes cats more than dogs
someone who likes chinese culture more than ethiopian
However, similarly Really Unfriendly optimizers are quite plausible. Imagine the horrors that could result from a naive human-happiness-maximizer hitting the singularity asymptote.
I don’t find such maximizers significantly plausible at even the human level intelligence. Possible in principle? Sure. But if you look at realistic, plausible routes to AGI it becomes clear that an AGI necessarily will be programmed in human languages and will pick up human cultural programs.
And finally, even if it was plausible that a flawed design could hit the singularity asymptote, that itself might only be a big problem if it had a short planning horizon.
It seems that all superintelligences with infinite planning horizons become behaviorally indistinguishable. All long-term value systems converge on a single universal attractor—they become cosmists.
That is what I mean when I said “any concept of universal morality must be evolutionary”.
The happiness-maximizer might base its understanding of happiness on descriptive human usage of the word, and end up with a truly thorough and consistent understanding of the word… and then still turn everybody into nearly mindless wireheads.
That would again be assuming humans capable of building a superhuman AGI but asinine enough to attempt to somehow hardcode it’s goal system, instead of making it open-ended dynamic as a human’s.
How would you build a happiness maximizer and fix the value of happiness? The meaning of a word in a human brain is stored as huge set of associate weights that anchor it in a massive distributed belief network. The exact meaning of each word changes over time as the network learns and reconfigures itself—no concept is quite static. So for an AGI to understand the word in the same way we do, the word’s meaning is always subject to some drift. And this is a good thing.
someone who likes chocolate ice cream and then some years later prefers vanilla instead
someone who likes impressionist art but then years later prefers post-modern
someone who likes cats more than dogs
someone who likes chinese culture more than ethiopian
I think we may be experiencing some terminology confusion here. Just to be clear, you realize that these are all not terminal values, right?
That [example of a happines-maximizer turning everyone into wireheads] would again be assuming humans capable of building a superhuman AGI but asinine enough to attempt to somehow hardcode it’s goal system, instead of making it open-ended dynamic as a human’s.
Here’s the big issue: if it’s open-ended, how do we keep it from drifting off somewhere terrible? The system that guides that seems to be the largest potential risk point of the approach you describe.
It seems that all superintelligences with infinite planning horizons become behaviorally indistinguishable. All long-term value systems converge on a single universal attractor—they become cosmists.
I’m very confused by this; can you go into more detail about why you think this is so? In particular, why would it be true for all long-term value systems (including flawed and simplistic value systems), and not just a very small subset?
I think we may be experiencing some terminology confusion here. Just to be clear, you realize that these are all not terminal values, right?
No. What is a terminal value? That which stimulates the planning reward circuit in the human nucleus accumbens? I’m not sure I buy into the concept.
The point of value or preferences from the perspective of intelligence is to rate potential futures.
Here’s the big issue: if it’s open-ended, how do we keep it from drifting off somewhere terrible?
We are open-ended! Our future-preferences depend on and our intertwined with our knowledge. So any superintelligence or evolutionary accelerator we create will also need to be open-ended, or it wouldn’t be protecting our dynamic core.
In particular, why would it be true for all long-term value systems (including flawed and simplistic value systems), and not just a very small subset?
I discussed some of this in my first, somewhat hasty, LW post here. A few others here have mentioned a similar idea, I may write more about it as I find it interesting.
Basically, if your planning horizon extends to infinity you will devote all of your resources towards expanding your net intelligence for the long term future, regardless of what your long term goals are.
So no matter whether your long term goal is to maximize paper-clips, human happiness or something more abstract, in each case this leads to an identical outcome for the foreseeable future: a local computational singularity with an exponentially expanding simulated metaverse.
There is some speculation within physics that black hole like singularities can create new physical universes through inflation. If this is true than the long term goals of a superintelligence are best served by literally creating new physical multiverses that have more of the desirable space-time properties.
What is a terminal value? That which stimulates the planning reward circuit in the human nucleus accumbens?
No, what I’m referring to is also known as an intrinsic value. It’s a value that is valuable in and of itself, not in justification for some other value. A non-terminal value is commonly referred to as an instrumental value.
For example, I value riding roller-coasters, and I also value playing Dance Dance Revolution. However, those values are expressible in terms of another, deeper value, the value I place on having fun. That value may in turn be thought of as an instrumental value of a yet deeper value: the value I place on being happy moment-to-moment.
If you were going to implement your own preference function as a Turing machine, trying to keep the code as short as possible, the terminal values would be the things that machine would value.
So no matter whether your long term goal is to maximize paper-clips, human happiness or something more abstract, in each case this leads to an identical outcome for the foreseeable future: a local computational singularity with an exponentially expanding simulated metaverse.
Okay, I see where you’re coming from. However, from a human perspective, that’s still a pretty large potential target range, and a large proportion of it is undesirable.
What is a terminal value? That which stimulates the planning reward circuit in the human nucleus accumbens?
the value I place on having fun . . .may in turn be thought of as an instrumental value of a yet deeper value: the value I place on being happy moment-to-moment.
From the deeper perspective of computational neuroscience, the intrinsic/instrumental values reduce to cached predictions of your proposed ‘terminal value’ (being happy moment-to-moment), which reduces to various types of stimulations of the planning reward circuitry.
Labeling the experience of chocolate ice cream as an ‘instrumental value’ and the resulting moment-to-moment happiness as the real ‘terminal value’ is a useless distinction—it then collapses your terminal values down to the singular of ‘happiness’ and relabels everything worthy of discussion as ‘instrumental’.
The quality of being happy moment-to-moment is anything but a single value and should not by any means be reduced to a single concept. It is a vast space of possible mental stimuli, each of which creates a unique conscious experience.
The set of mental states encompassed by “being happy moment-to-moment moment-to-moment” is vast: the gustatory pleasure of eating chocolate ice cream, the feeling of smooth silk sheets, the release of orgasm, the satisfaction of winning a game of chess, the accomplishment of completing a project, the visual experience of watching a film, the euphoria of eureka, all of these describe entire complex spaces of possible mental states.
Furthemore, the set of possible mental states is forever dynamic, incomplete, and undefined. The set of possible worlds that could lead to different visual experiences, as just a starter example, is infinite, and each new experience or piece of knowledge itself changes the circuitry underlying the experiences and thus changes our values.
If you were going to implement your own preference function as a Turing machine, trying to keep the code as short as possible, the terminal values would be the things that machine would value.
The simplest complete turing machine implementation of your preference function is an emulation of your mind. It is you, and it has no perfect simpler equivalent (although many imperfect simulations are possible).
However, from a human perspective, that’s [computational singularity] still a pretty large potential target range, and a large proportion of it is undesirable
The core of the cosmist idea is that for any possible goal evaluator with an infinite planning horizon, there is a single convergent optimal path towards that goal system. So no, the potential target range in theory is not large at all—it is singularly narrow.
As an example, consider a model universe consisting of a modified game of chess or go. The winner of the game is then free to arrange the pieces on the board in any particular fashion (including the previously dead pieces). The AI’s entire goal is to make some particular board arrangement - perhaps a smily face. For any such possible goal system, all AI’s play the game exactly the same at the limits of intelligence—they just play optimally. Their behaviour doesn’t differ in the slightest until the game is done and they have won.
Whether the sequence of winning moves such a god would make on our board is undesirable or not from our current perspective is a much more important, and complex, question.
Similarly, a paperclip-maximizer might well be interested in figuring out why its utility function is what it is, so that it may better understand the world it lives in… but that’s not going to change its overriding interest in making paperclips over all else.
Right, but as far as I can tell without having put lots of hours into trying to solve the problem of clippyAI, it’s really damn hard to precisely specify a paperclip. (There are things that are easier to specify that this argument doesn’t apply to and that are more plausibly dangerous, like hyperintelligent theorem provers...) Thus in trying to figure out what it’s utility function actually is (like what humans are doing as they introspect more) it could discover that the only reason its goal is (something mysterious like) ‘maximize paperclips’ is because ‘maximize paperclips’ was how humans were (probabilistically inaccurately) expressing their preferences in some limited domain. This is related to the theme Eliezer quite elegantly goes on about in Creating Friendly AI and that he for some reason barely mentioned in CEV, which is that the AI should look at its own source code as evidence of what its creators were trying to get at, and update its imperfect source code accordingly. Admittedly, most uFAIs probably won’t be that sophisticated, and so worrying about AI-related existential risks is still definitely a big deal. We just might want to be a little more cognizant of potential motivations for people who disagree with what has recently been dubbed SIAI’s ‘scary idea’.
Thus in trying to figure out what it’s utility function actually is (like what humans are doing as they introspect more) it could discover that the only reason its goal is (something mysterious like) ‘maximize paperclips’ is because ‘maximize paperclips’ was how humans were (probabilistically inaccurately) expressing their preferences in some limited domain.
Hm. I suppose that’s possible, though it would require that the AI be given a utility function that’s specifically meant to be amenable to that kind of revision.
Under the most straightforward (i.e. not CEV-style) utility function design, fuzziness in its definition of “paperclip” would just drive the paperclip-maximizer to choose the possible definition that yields the highest utility score.
To pick a different silly example, a dog-maximizer with a utility function based on the number of dogs in the universe would simply prefer to tile the solar system with tiny Chihuahas rather than Great Danes; the whole range of “dog” definitions fit the function, so it just chooses the one that is most convenient for maximum utility. It wouldn’t try to resolve it by trying to decide which definition is more in line with the designer’s ideals, unless “consider the designer’s ideals” were designed into the system from the start.
Currently expected to be difficult, since we don’t know of an easy way to do so. That it’ll turn out to be easy (in the hindsight) is not totally out of the question.
Is designing “consider the designer’s ideals” in an AI difficult?
Currently expected to be difficult, since we don’t know of an easy way to do so.
Has anyone considered approaching this problem in the same way we might approach “read the user’s handwriting”? That is, the task is not one we program the AI to accomplish—instead, we train the AI to accomplish it. And, most importantly, we train the AI to ask for further clarification in ambiguous cases.
Mirrors and Paintings (yes, you want to point your program at the world and have it figure out what you referred to), The Hidden Complexity of Wishes (if you need to answer AI’s question or give it instructions, you’re doing something wrong and it won’t work).
I have to admit, as someone who has worked in software testing, I find it difficult to take the suggestion (non-destructive full-brain scan) in the first link very seriously. How, exactly, do I become convinced that the AI can come to know more about what I want by scanning me than I can know by introspection? How can I (or it) even do a comparison between the two without it asking me questions?
But then we get down to doing the comparison. The AI informs me that what I really want is to kill my father and sleep with my mother. I deny this. Do we take this as evidence that the AI really does know me better than I know myself, or as a symptom of a bug?
I would argue that if you don’t need to answer the AI’s questions or give it instructions, you’re doing something wrong and it won’t work. By definition. At least for the first ten thousand scans or so. And even then there will remain questions on which the AI and introspection would deliver different answers. Questions with hidden complexity. I just don’t see how anyone would trust a CEV extrapolated from brain scans until we had decades of experience suggesting that scanning and modeling yields better results than introspection.
I would argue that if you don’t need to answer the AI’s questions or give it instructions, you’re doing something wrong and it won’t work. By definition.
Agreed. And any useful AI will have to understand human language to do or learn much anything of value.
The detailed analysis of full brain scanning tech I’ve seen puts it far into the future, well beyond human-level AGI.
And even then there will remain questions on which the AI and introspection would deliver different answers.
You have to make sure AI predictably gives a better answer even on questions where you disagree. And there will be questions which can’t even be asked of a human.
I have to admit, as someone who has worked in software testing, I find it difficult to take the suggestion (non-destructive full-brain scan) in the first link very seriously. How, exactly, do I become convinced that the AI can come to know more about what I want by scanning me than I can know by introspection? How can I (or it) even do a comparison between the two without it asking me questions?
Irrelevant. Assume you magically have a perfect working simulation of yourself.
Assume you magically have a perfect working simulation of yourself.
Why would I want to do that? I.e. how would making that assumption lead me to take Eliezer’s suggestion more seriously? My usual practice is to take things less seriously when magic is involved.
And how does this assumption interact with your other comment stating that I have to make sure the AI is somehow even better than myself if there is any difference between simulation and reality? Haven’t you just asked me to assume that there are no differences?
Sorry, I simply don’t understand your responses, which suggests to me that you did not understand my comment. Did you notice, in my preamble, that I mentioned software testing? Perhaps my point may be clearer to you if you keep this preamble in mind when formulating your responses.
Because that’s a conceptually straightforward assumption that we can safely make in a philosophical argument.
The upload is not the AI (and Eliezer’s post doesn’t refer to uploads IIRC, but for the sake of the argument assume they are available as raw material). You make AI correct on strong theoretical grounds, and only test things to check that theoretical assumptions hold in ways where you expect it to be possible to check things, not in every situation.
Did you notice, in my preamble, that I mentioned software testing?
Because that’s a conceptually straightforward assumption that we can safely make in a philosophical argument.
But this is not a philosophical argument.
To recap:
I suggested that an AI which is a precursor to the FAI should come to understand human values by interacting (over an extended ‘training’ period) with actual humans—asking them questions about their values and perhaps performing some experiments as in a psych or game theory laboratory.
You responded by linking to this, which as I read it suggests that the most accurate and efficient way to extract the values of a human test subject would be by carrying out a non-destructive brain scan. Quoting the posting:
So when we try to make an AI whose physical consequence is the implementation of what is right, we make that AI’s causal chain start with the state of human brains—perhaps non-destructively scanned on the neural level by nanotechnology, or perhaps merely inferred with superhuman precision from external behavior—but not passed through the noisy, blurry, destructive filter of human beings trying to guess their own morals.
I asked how we could possibly come to know by testing that the scanning and brain modeling was working properly. I could have asked instead how we could test the hypothesis that the inference from behavior was working properly.
These are questions about engineering and neuroscience, not questions of philosophy. The question of what is right/wrong is a philosophical question. The question of what do humans believe about right and wrong is a psychology question. The question of how those beliefs are represented in the brain is a neuroscience question. The question of how an AI can come to learn these things is GOFAI. The question of how we will know we have done it right is a QC question. Software test. That was the subject of my comment. It had nothing at all to do with philosophy.
You make AI correct on strong theoretical grounds, and only test things to check that theoretical assumptions hold in ways where you expect it to be possible to check things, not in every situation.
Ok, in this context, I interpret this to mean that we will not program in the neuroscience information that it will use to interpret the brain scans. Instead we will simply program the AI to be a good scientist. A provably good scientist. Provable because it is a simple program and we understand epistemology well enough to write a correct behavioral specification of a scientist and then verify that the program meets the specification. So we can let the AI design the brain scanner and perform the human behavioral experiments to calibrate its brain models. We only need to spot-check the science it generates, because we already know that it is a good scientist.
Hmmm. That is actually a pretty good argument, if that is what you are suggesting. I’ll have to give that one some thought.
These are questions about engineering and neuroscience, not questions of philosophy. The question of what is right/wrong is a philosophical question. The question of what do humans believe about right and wrong is a psychology question. The question of how those beliefs are represented in the brain is a neuroscience question. The question of how an AI can come to learn these things is GOFAI. The question of how we will know we have done it right is a QC question. Software test. That was the subject of my comment. It had nothing at all to do with philosophy.
Sorry, not my area at the moment. I gave the links to refer to arguments for why having AI learn in the traditional sense is a bad idea, not for instructions on how to do it correctly in a currently feasible way. Nobody knows that, so you can’t expect an answer, but the plan of telling the AI things we think we want it to learn is fundamentally broken. If nothing better can be done, too bad for humanity.
Ok, in this context, I interpret this to mean that we will not program in the neuroscience information that it will use to interpret the brain scans. Instead we will simply program the AI to be a good scientist.
This is much closer, although a “scientist” is probably a bad word to describe that, and given that I don’t have any idea what kind of system can play this role, it’s pointless to speculate. Just take as the problem statement what you quoted from the post:
try to make an AI whose physical consequence is the implementation of what is right
Irrelevant. Assume you magically have a perfect working simulation of yourself.
Relevant—Can we just assume you magically have a friendly AI then?
If the plan for creating a friendly AI depends on a non-destructive full-brain scan already being available, the odds of achieving friendly AI before other forms of AI vanish to near zero.
One step at a time, my good sir! Reducing the philosophical and mathematical problem of Friendly AI to the technological problem of uploading would be an astonishing breakthrough quite by itself.
I think this reflects the practical problem with Friendly AI—it is an ideal of perfection taken to an extreme that expands the problem scope far beyond what is likely to be near term realizable.
I expect that most of the world, research teams, companies, the VC community and so on will be largely happy with an AGI that just implements an improved version of the human mind.
For example, humans have an ability to model other agents and their goals, and through love/empathy value the well-being of others as part of our own individual internal goal systems.
I don’t see yet why that particular system is difficult or more complex than the rest of AGI.
It seems likely that once we can build an AGI as good as the brain we can build one that is human-like but only has the love/empathy circuitry in it’s goal system with the rest of the crud stripped out.
In other words if we can build AGI’s modeled after the best components of the best examples of altruistic humans, this should be quite sufficient.
That is, the task is not one we program the AI to accomplish—instead, we train the AI to accomplish it. And, most importantly, we train the AI to ask for further clarification in ambiguous cases
This is the straightforward approach.
Once you have an AGI that has the cognitive capability and learning capacity of a human infant brain, you teach it everything else in human language—right/wrong, ethics/morality, etc.
Programming languages are precise and well suited for creating the architecture itself, but human languages are naturally more effective for conveying human knowledge.
I tend to agree that we need a natural language interface to the AI. But it is far easier to create automatic proofs of program correctness when the really important stuff (like ethics) is presented in a formal language equipped with a deductive system.
There is something to be said for treating all the natural language input as if it were testimony from unreliable witnesses—suitable, perhaps, for locating hypotheses, but not really suitable as strong evidence for accepting the hypotheses.
But it is far easier to create automatic proofs of program correctness
I’m not sure how this applies—can you formally prove the correctness of a probabilistic belief network? Is that even a valid concept?
I can understand how you can prove a formal deterministic circuit or the algorithms underlying the belief network and learning systems, but the data values?
Agree. That is why I suggest that the really important stuff—meta-ethics, epistemology, etc., be represented in some other way than by ‘neural’ networks. Something formal and symbolic, rather than quasi-analog. All the stuff which we (and the AI) need to be absolutely certain doesn’t change meaning when the AI “rewrites its own code”
The really important stuff isn’t a special category of knowledge. It is all connected—a tangled web of interconnected complex symbolic concepts for which human language is a natural representation.
What is the precise mathematical definition of ethics? If you really think of what it would entail to describe that precisely, you would need to describe humans, civilization, goals, brains, and a huge set of other concepts.
In essence you would need to describe an approximation of our world. You would need to describe a belief/neural/statistical inference network that represented that word internally as a complex association between other concepts that eventually grounds out into world sensory predictions.
So this problem—that human language concepts are far too complex and unwieldy for formal verification—is not a problem with human language itself that can be fixed by using other language choices. It reflects a problem with the inherit massive complexity of the world itself, complexity that human language and brain-like systems are evolved to handle.
So this problem—that human language concepts are far too complex and unwieldy for formal verification—is not a problem with human language itself that can be fixed by using other language choices. It reflects a problem with the inherit massive complexity of the world itself, complexity that human language and brain-like systems are evolved to handle.
These folks seem to agree with you about the massive complexity of the world, but seem to disagree with you that natural language is adequate for reliable machine-based reasoning about that world.
As for the rest of it, we seem to be coming from two different eras of AI research as well as different application areas. My AI training took place back around 1980 and my research involved automated proofs of program correctness. I was already out of the field and working on totally different stuff when neural nets became ‘hot’. I know next to nothing about modern machine learning.
I’ve read about CYC a while back—from what I recall/gather it is a massive handbuilt database of little natural language ‘facts’.
Some of the new stuff they are working on with search looks kinda interesting, but in general I don’t see this as a viable approach to AGI. A big syntactic database isn’t really knowledge—it needs to be grounded to a massive sub-symbolic learning system to get the semantics part.
On the other hand, specialized languages for AGI’s? Sure. But they will need to learn human languages first to be of practical value.
You look at CYC and see a massive hand-built database of facts.
I look and see a smaller (but still large) hand-built ontology of concepts
You, probably because you have worked in computer vision or pattern recognition, notice that the database needs to be grounded in some kind of perception machinery to get semantics.
I, probably because I have worked in logic and theorem proving, wonder what axioms and rules of inference exist to efficiently provide inference and planning based upon this ontology.
One of my favorite analogies and I’m fond of the Jainist? multi-viewpoint approach.
As for the logic/inference angle, I suspect that this type of database underestimates the complexity of actual neural concepts—as most of the associations are subconscious and deeply embedded in the network.
We use ‘connotation’ to describe part of this embedding concept, but I see it as even deeper than that. A full description of even a simple concept may be on the order of billions of such associations. If this is true, then a CYC like approach is far from appropriately scalable.
It appears that you doubt that an AI whose ontology is simpler and cleaner than that of a human can possibly be intellectually more powerful than a human.
All else being equal, I would doubt that with respect to a simpler ontology, while the ‘cleaner’ adjective is less well defined.
Look at it in terms of the number of possible circuit/program configurations that are “intellectually more powerful than a human” as a function of the circuit/program’s total bit size.
At around the human level of roughly 10^15 I’m almost positive there are intellectually more powerful designs—so P_SH(10^15) = 1.0.
I’m also positive that beyond some threshold there are absolutely zero possible configurations of superhuman intellect—say P_SH(10^10) ~ 0.0.
Of course “intellectually more powerful” is open to interpretation. I’m thinking of it here in terms of the range of general intelligence tasks human brains are specially optimized for.
IBM’s Watson is superhuman in a certain novel narrow range of abilities, and it’s of complexity around 10^12 to 10^13.
To get to that point we have to start from the right meaning to begin with, and care about preserving it accurately, and Jacob doesn’t agree those steps are important or particularly hard.
As for the start with the right meaning part, I think it is extremely hard to ‘solve’ morality in the way typically meant here with CEV or what not.
I don’t think that we need (or will) wait to solve that problem before we build AGI, any more or less than we need to solve it for having children and creating a new generation of humans.
If we can build AGI somewhat better than us according to our current moral criteria, they can build an even better successive generation, and so on—a benevolence explosion.
As for the second part about preserving it accurately, I think that ethics/morality is complex enough that it can only be succinctly expressed in symbolic associative human languages. An AGI could learn how to model (and value) the preferences of others in much the same way humans do.
I don’t think that we need (or will) wait to solve that problem before we build AGI, any more or less than we need to solve it for having children and creating a new generation of humans.
If we can build AGI somewhat better than us according to our current moral criteria, they can build an even better successive generation, and so on—a benevolence explosion.
Someone help me out. What is the right post to link to that goes into the details of why I want to scream “No! No! No! We’re all going to die!” in response to this?
Why would an AI which optimises for one thing create another AI that optimises for something else? Not every change is an improvement, but every improvement is necessarily a change. Building an AI with a different utility function is not going to satisfy the first AI’s utility function! So whatever AI the first one builds is necessarily going to either have the same utility function (in which case the first AI is working correctly), or have a different one (which is a sign of malfunction, and given the complexity of morality, probably a fatal one).
It’s not possible to create an AGI that is “somewhat better than us” in the sense that it has a better utility function. To the extent that we have a utility function at all, it would refer to the abstract computation called “morality”, which “better” is defined by. The most moral AI we could create is therefore one with precisely that utility function. The problem is that we don’t exactly know what our utility function is (hence CEV).
There is a sense in which a Friendly AGI could be said to be “better than us”, in that a well-designed one would not suffer from akrasia and whatever other biases prevent us from actually realizing our utility function.
AI’s without utility functions, but some other motivational structure, will tend to self-improve to a utility function AI. Utility-function AI’s seem more stable under self-improvement, but there are many reasons it might want to change its utility (eg speed of access, multi-agent situations).
Why would an AI which optimises for one thing create another AI that optimises for something else?
It wouldn’t if it initially considered itself to be the only agent in the universe. But if it recognizes the existence of other agents and the impact of other agents’ decisions on its own utility, then there are many possibilities:
The new AI could be created as a joint venture of two existing agents.
The new AI could be built because the builder was compensated for doing so.
The new AI could be built because the builder was threatened into doing so.
Building an AI with a different utility function is not going to satisfy the first AI’s utility function!
This may seem intuitively obvious, but it is actually often false in a multi-agent environment.
Why would an AI which optimises for one thing create another AI that optimises for something else? Not every change is an improvement, but every improvement is necessarily a change. Building an AI with a different utility function is not going to satisfy the first AI’s utility function!
Yes it certainly can—if that new AI helps it’s creator.
The same issue applies to children—they don’t necessarily have the same ‘utility function’, sometimes they even literally kill us, but usually they help us.
It’s not possible to create an AGI that is “somewhat better than us” in the sense that it has a better utility function.
Sure it is—this part at least is easy. For example an AGI that is fully altruistic and only experiences love as it’s single emotion would be clearly “somewhat better than us” from our perspective in every sense that matters.
If that AGI would not be somewhat better than us in the sense of having a better utility function, then ‘utility function’ is not a useful concept.
The problem is that we don’t exactly know what our utility function is (hence CEV)
The real problem is the idea that morality can or should be simplified down to a ‘utility function’ simple enough for a human to code.
Before tackling that problem, it would probably best to start with something much simpler, such as a utility function that could recognize dogs vs cats and other objects in images. If you actually research this it quickly becomes clear that real world intelligences make decisions using much more complexity than a simple utility-maximizing algorithm.
Yes it certainly can—if that new AI helps it’s creator.
The same issue applies to children—they don’t necessarily have the same ‘utility function’, sometimes they even literally kill us, but usually they help us.
That would be not so much a benevolence explosion as a single AI creating “slave” AIs for its own purposes. If some of the child AI’s goals (for example those involved in being more good) are opposed to the parent’s goals (for example those which make the parent AI less good), the parent is not going to just let the child achieve its goals. Rational agents do not let their utility functions change.
Sure it is—this part at least is easy. For example an AGI that is fully altruistic and only experiences love as it’s single emotion would be clearly “somewhat better than us” from our perspective in every sense that matters.
If you mean that the AI doesn’t suffer from the akrasia and selfishness and emotional discounting and uncertainty about our own utility function which prevents us from acting out our moral beliefs then I agree with you. That’s the AI being more rational than us, and therefore better optimising for its utility function. But a literally better utility function is impossible, given that “better” is defined by our utility function.
Moreover, if our utility function describes what we truly want (which is the whole point of a utility function), it follows that we truly want an AI that optimizes for our utility function. If “better” were a different utility function then it would be unclear why we are trying to create an AI that does that, rather than what we want.
The real problem is the idea that morality can or should be simplified down to a ‘utility function’ simple enough for a human to code.
That’s why the plan is for the AI to figure it out by inspecting us. Morality is very much not simple to code.
The same issue applies to children—they don’t necessarily have the same ‘utility function’, sometimes they even literally kill us, but usually they help us.
That would be not so much a benevolence explosion as a single AI creating “slave” AIs for its own purposes
So do we create children as our ‘slaves’ for our own purposes? You seem to be categorically ruling out the entire possibility of humans creating human-like AIs that have a parent-child relationship with their creators.
So just to make it precisely clear, I’m talking about that type of AI specifically. The importance and feasibility of that type of AGI vs other types is a separate discussion.
Sure it is—this part at least is easy. For example an AGI that is fully altruistic and only experiences love as it’s single emotion would be clearly “somewhat better than us” from our perspective in every sense that matters.
If you mean that the AI doesn’t [ .. ]
That’s the AI being more rational than us, and therefore better optimising for its utility function.
I don’t see it as having anything to do with rationality.
The altruistic human-ish AGI mentioned above would be better than current humans from our current perspective—more like what we wish ourselves to be, and more able to improve our world than current humans.
Moreover, if our utility function describes what we truly want (which is the whole point of a utility function), it follows that we truly want an AI that optimizes for our utility function.
Yes.
This is obvious if it’s ‘utility function’ is just a projection of my own—ie it simulates what I would want and uses that as it’s utility function, but that isn’t even necessary—it’s utility function could be somewhat more complex than just a simulated projection of my own and still help fulfill my utility function.
That’s why the plan is for the AI to figure it out by inspecting us. Morality is very much not simple to code.
If by inspection you just mean teach the AI morality in human language, then I agree, but that’s a side point.
So: I want to finish my novel, but I spend the day noodling around the Internet instead.
Then Omega hands me an AI which it assures me is programmed error-free to analyze me and calculate my utility function and optimize my environment in terms of it.
I run the AI, and it determines exactly which parts of my mind manifest a desire to finish the novel, which parts manifest a desire to respond to the Internet, and which parts manifest a desire to have the novel be finished. Call them M1, M2 and M3. (They are of course overlapping sets.) Then it determines somehow which of these things are part of my utility function, and which aren’t, and to what degree.
So...
Case 1: The AI concludes that M1 is part of my utility function and M2 and M3 are not. Since it is designed to maximize my utility, it constructs an environment in which M1 triumphs. For example, perhaps it installs a highly sophisticated filter that blocks out 90% of the Internet. Result: I get lots more high-quality work done on the novel. I miss the Internet, but the AI doesn’t care, because that’s the result of M2 and M2 isn’t part of my utility function.
Case 2: The AI concludes that M3 and M2 are part of my utility function and M1 is not, so it finishes the novel itself and modifies the Internet to be even more compelling. I miss having the novel to work on, but again the AI doesn’t care.
Case 3: The AI concludes that all three things are part of my utility function. It finishes the novel but doesn’t tell me about it, thereby satisfying M3 (though I don’t know it). It makes a few minor tweaks to my perceived environment, but mostly leaves them alone, since it is already pretty well balanced between M1 and M2 (which is not surprising, since I was responding to those mental structures when I constructed my current situation).
If I’m understanding you correctly, you’re saying that I can’t really know which of these results (or of countless other possibilities) will happen, but that whichever one it is, I should have high confidence that all other possibilities would by my own standards have been worse… after all, that’s what it means to maximize my utility function.
Yes?
It seems to follow that if the AI has an added feature whereby I can ask it to describe what it’s about to do before it does it and then veto doing it, I ought not invoke that feature. (After all, I can’t make the result better, but I might make the result worse.)
Assuming you trust Omega to mean the same thing as you do when talking about your preferences and utility function, then yes. If the AI looks over your mind and optimizes the environment for your actual utility function (which could well be a combination of M1, M2 and M3), then any veto you do must make the result worse than the optimal one.
Of course, if there’s doubt about the programming of the AI, use of the veto feature would probably be wise, just in case it’s not a good genie.
You seem to be imagining a relatively weak AI. For instance, given the vast space of possibilities, there are doubtlessly environmental tweaks that would result in more fun on the internet and more high-quality work on the novel. (This is to say nothing of more invasive interventions.)
The answer to your questions is yes: assuming the AI does what Omega says it does, you won’t want to use your veto.
Not necessarily weak overall, merely that it devotes relatively few resources to addressing this particular tiny subset of my preference-space. After all, there are many other things I care about more.
But, sure, a sufficiently powerful optimizer will come up with solutions so much better that it will never even occur to me to doubt that all other possibilities would be worse. And given a sufficiently powerful optimizer, I might as well invoke the preview feature if I feel like it, because I’ll find the resulting preview so emotionally compelling that I won’t want to use my veto.
That case obscures rather than illustrates the question I’m asking, so I didn’t highlight it.
Case 4: The AI makes tweaks to your current environment in order to construct it in accordance with your mental structures, but in a way more efficient than you could have in the first place.
Sure. In which case I still noodle around on the Internet a bunch rather than work on my novel, but at least I can reassure myself that this optimally reflects my real preferences, and any belief I might have that I would actually rather get more work done on my novel than I do is simply an illusion.
If you actually research this it quickly becomes clear that real world intelligences make decisions using much more complexity than a simple utility-maximizing algorithm.
I occasionally point out that you can model any computable behaviour using a utility-maximizing algorithm, provided you are allowed to use a partially-recursive utility function.
Also, very little of the sequences have much of anything to do with AI. If I want to learn more about that I would look to Norvig’s book or more likely the relevant papers online. No need to be rude just because I don’t hold all your same beliefs.
Also, very little of the sequences have much of anything to do with AI.
It’s more of a problem with your understanding of ethics, as applied to AI (and since this is the main context in which AI is discussed here, I referred to that as simply AI). You might be very knowledgeable in contemporary machine learning or other AI ideas while not seeing, for example, the risks of building AGIs.
No need to be rude just because I don’t hold all your same beliefs.
Unfortunately there is (in some senses of “rude”, such as discouraging certain conversational modes).
You might be very knowledgeable in contemporary machine learning or other AI ideas while not seeing, for example, the risks of building AGIs
I see the potential risks in building AGIs.
I don’t see that risk being dramatically high for creating AGIs based loosely on improving the human brain, and this approach appears to be mainstream now or becoming the mainstream (Kurzweil, Hawkins, Darpa’s neuromorphic initiative, etc).
I’m interested in the serious discussion or analysis of why that risk could be high.
You have been discussing favourably the creation of AGIs that are programmed to create AGIs with different values to their own. No, you do not understand the potential risks.
We create children that can have different values than our own, and over time this leads to significant value drift. But perhaps it should be called ‘value evolution’.
This process is not magically guaranteed to preserve our best interests from our current perspective when carried over to AGI, but nor is guaranteed to spontaneously destroy the world.
We create children that can have different values than our own, and over time this leads to significant value drift. But perhaps it should be called ‘value evolution’.
Your analogy with evolution is spot on: if the values are going to drift at all, we want to drift towards some target point, by selecting against sub-AIs that have values further from the point.
However, if we can do that, why not just put that target point right in the first AI’s utility function, and prevent any value drift at all? It seems like it ends up with the same result, but with slightly less complication.
And, if we can’t set a target point for the value drift evolution… then it might drift anywhere at all! The chances that it would drift somewhere we’d like are pretty small. This applies even if it were a human-brain-based AGI; in general people are quite apt to go corrupt when given only a tiny bit of extra power. A whole load of extra power, like superintelligence would grant, would have a good chance of screwing with that human’s values dramatically, possibly with disastrous effects.
Your analogy with evolution is spot on: if the values are going to drift at all, we want to drift towards some target point, by selecting against sub-AIs that have values further from the point.
Yes.
However, if we can do that, why not just put that target point right in the first AI’s utility function, and prevent any value drift at all?
The true final ‘target point’ is unknown, and unknowable in principle. We don’t have the intelligence/computational power right now to know it, no AGI we can build will know it exactly, and this will forever remain true.
Our values are so complex that the ‘utility function’ that describes them is our entire brain circuit—and as we evolve into more complex AGI designs our values will grow in complexity as well.
Fixing them completely would be equivalent to trying to stop evolution. It’s pointless, suicidal, impossible.
And, if we can’t set a target point for the value drift evolution… then it might drift anywhere at all!
Yes evolution could in principle take us anywhere, but we can and already do exert control over it’s direction.
This applies even if it were a human-brain-based AGI; in general people are quite apt to go corrupt when given only a tiny bit of extra power.
Humans today have a range of values, but an overriding universal value is not-dying. To this end it is crucially important that we reverse engineer the human mind.
Ultimately if what we really value is conscious human minds, and computers will soon out-compete human brains, then clearly we need to transfer human minds over to computers.
One simple point is that there is no reason to expect AGIs to stop at exactly human level. Even if progress and increase in intelligence is very slow, eventually they become an existential risk, or at least a value risk. Every step in that direction we make now is a step in the wrong direction, which holds even if you believe it’s a small step.
One simple point is that there is no reason to expect AGIs to stop at exactly human level.
This isn’t the first time I heard this, but I don’t think it’s exactly right.
We know that human level is possible, but while super human level being possible seems overwhelmingly likely from considerations like imagining a human with more working memory and running faster we don’t technically know that.
We have a working example of a human level intelligence.
It’s human level intelligences doing the work. Martians work on AI might asymptotically slow down when approaching martian level intelligence without that level being inherently significant for anyone else, and the same for humans, or any AGI of any level working on its own successor for that matter (not that I have any strong belief that this is the case, it’s just an argument for why human level wouldn’t be completely arbitrary as a slow down point)
I’d completely agree with “there is no strong reason to expect AGIs to stop at exactly human level”, “High confidence* in AGIs stopping at exactly human level is irrational” or “expecting AGIs not to stop at exactly human level would be prudent.”
*Personally I’d assign a probability of under 0.2 to the best AGI’s being on a level roughly comparable to human level (let’s say being able to solve any problem except human relationship problems that every IQ 80+ human can solve, but not being better at every task than any human) for at least 50 years (physical time in Earth’s frame of reference, not subjective time; probably means inferior at an equal clock rate but making up for that with speed for most of that time). That’s a lot more than I would assign any other place on the intelligence scale of course.
Could the downvoter please say what they are disagreeing with? I can see at least a dozen mutually contradictory possible angles so “someone thinks something about posting this is wrong” provides almost no useful information.
very little of the sequences have much of anything to do with AI.
There is some discussion of the dangers of a uFAI Singularity, particularly in this debate between Robin Hanson and Eliezer. Much of the danger arises from the predicted short time period required to get from a mere human-level AI to a superhuman AI+. Eliezer discusses some reasons to expect it to happen quickly here and here. The concept of a ‘resource overhang’ is crucial in dismissing Robin’s skepticism (which is based on historical human experience in economic growth—particularly in the accumulation of capital).
For an analysis of the possibility of a hard takeoff in approaches to AI based loosely on modeling or emulating the human brain, see this posting by Carl Schulman, for example.
The concept of a ‘resource overhang’ is crucial in dismissing Robin’s skepticism (which is based on historical human experience in economic growth—particularly in the accumulation of capital).
If civilisation(t+1) can access resources much better than civilisation(t), then that is just another way of saying things are going fast—one must beware of assuming what one is trying to demonstrate here.
The problem I see with this thinking is the idea that civilisation(t) is a bunch of humans while civilisation(t+1) is a superintelligent machine.
In practice, civilisation(t) is a man-machine symbiosis, while civilisation(t+1) is another man-machine symbiosis with a little bit less man, and a little bit more machine.
Currently expected to be difficult, since we don’t know of an easy way to do so. That it’ll turn out to be easy (in the hindsight) is not totally out of the question.
There are some promising lines of attack (grounded in decision theory) that might take only a few years of research. We’ll see where they lead. Other open problems in FAI might start looking very solvable if we start making progress on this front.
This is related to the theme Eliezer quite elegantly goes on about in Creating Friendly AI and that he for some reason barely mentioned in CEV, which is that the AI should look at its own source code as evidence of what its creators were trying to get at, and update its imperfect source code accordingly.
Yes, but it still has to be explicitly programmed to do that! The question is how to get it to do so. AFAIK shaper-anchor semantics is still quite a ways from being fully specified, but it seems the bigger obstacle is that an AI writer is less likely than not to take the effort to program it that way in the first place.
it’s really damn hard to precisely specify a paperclip [...]
This is surely the kind of thing that superintelligences will be good at. They will have access to every paperclip picture on the net, every paperclip specification too. They will surely have a much clearer idea about what a paperclip is than humans do. They will know what boxes are too.
Right, but as far as I can tell without having put lots of hours into trying to solve the problem of clippyAI, it’s really damn hard to precisely specify a paperclip.
I made a stab at it here, and it got some upvotes. So here’s a repost:
Make a wire, 10 cm long and 1mm in diameter, composed of an alloy of 99.8% iron and 0.2% carbon. Start at one end and bend it such that the segments from 2-2.5cm, 2.75-3.25cm, 5.25-5.75cm form half-circles, with all the bends in the same direction and forming an inward spiral (the end with the first bend is outside the third bend).
(Please let me know if reposting violates LW ettiquette so I know not to do it again.)
Here’s a sort of fully general counterargument against proposals to naturalize human concepts in AI terms: if you can naturalize human concepts, you should be able to naturalize the human concept of a box. And if you can do that, we can build Oracle AI and save the world. It’s very easy to describe what we mean by ‘stay in the box’, but it turns out that seed (self-modifying!) AIs just don’t have a natural ontology for the descriptions.
This argument might be hella flawed; it seems kind of tenuous.
That assumption isn’t really a core part of the argument… the general “if specifying human concepts is easy, then come up with a plan for making a seed AI want to stay in a box” argument still stands, even if we don’t actually want to keep arbitrary seed AIs in boxes.
For the record I am significantly less certain than most LW or SIAI singularitarians that seed AIs not explicitly coded with human values in mind will end up creating a horrible future, or at least a more horrible future than something like CEV. I do think it’s worth a whole lot of continued investigation.
I think I see what you’re saying; just as we reflect on our desires and try to understand how they tick and where, biologically and historically and culturally, they come from, so also might any AI.
However, the thing about it is: that doesn’t actually change those values. For example (and despite the dire warnings of some creationists), despite the fact that we now understand that our value system is a consequence of an evolutionary algorithm, we haven’t actually started valuing evolutionary goals over our own built-in goals. For example, contraception is popular even though it’s quite silly from the perspective of gene propogation.
Similarly, a paperclip-maximizer might well be interested in figuring out why its utility function is what it is, so that it may better understand the world it lives in… but that’s not going to change its overriding and primary interest in making paperclips.
It seems as though that sometimes triggers intimate pair-bonding activities while reducing your exposure to STDs. Use of condoms is often not remotely silly from that perspective—IMHO.
The example still works since there are quite a few couples who use condoms because they just don’t want to have kids. They don’t have any worry about STDs from their partner. If you insist on a clear cut case look at men who get vasectomies.
The idea that use of contraception is “silly” from the perspective of gene propagation seem just wrong to me. There are plenty of cases where it would make sense for those who want to spread their genes around to agree to use contraceptives. Contraceptive use makes sense sometimes, and not others.
It could be claimed that the average effect of contraception on genes is negative—but that seems to be a whole different thesis.
Tim, do you agree that there exist couples who plan to never have children and use contraception to that end?
Sure. Surely we are not disagreeing here. The original comment was:
My position is just that contraception has a perfectly reasonably place for gene propogators. The idea that contraception is always opposed to your genetic interests is wrong. Lack of contraception can easily result in things like this—which really doesn’t help. That using contraception is “silly” from a genetic perspective is a popular myth.
I’m not sure if we are. The fact that contraception might have a reasonable place for gene propagators is not the issue. The point is that much, and possibly the vast majority, of contraceptive use is contrary to the goals of gene propagation.
Not really. Remember, evolution doesn’t care about your happiness. Indeed, regarding the example you linked to, from an evolutionary perspective,a one night stand with all the protection is utterly useless. It is very likely in that male’s evolutionary advantage to not use condoms.
And even if you don’t agree with the condom example the other example, of a people engaging in a generally irreversible or difficult to reverse operation which renders them close to sterile is pretty clearly against the interest of gene propagation.
Humans evolved in a context where we didn’t have easy contraception and the best humans could do to prevent contraception was things like coitus interruptus. It shouldn’t surprise you that evolution has not made human instincts catch up with modern technologies.
One might think that from an evolutionary perspective it makes sense to substantially delay or reduce offspring number so as to invest maximum resources in a small number of offspring. But humans in the developed world now reside in a situation with low disease rates and lots of resources, so that strategy is sub-optimal from an evolutionary perspective. Look at how charedi(ultra-orthodox) Jews and the Amish are two of the fastest growing populations in the United States.
I can see what you think the issue is. What I don’t see is where in the context you are getting that impression from.
Your example is stacked to favour your conclusion. What you need to try and do in order to understand my position is to think about an example that favours my conclusion.
So: get rid of the one-night stand, and imagine that the girl is desirable—that having safe sex with her looks like the best way to initiate a pair-bonding process leading to the two of you having some babies together—and that the alternative is rejection, and her walking off and telling her friends what a jerk you are when it comes to protecting your girl.
In the modern context, if you impregnate someone without planning it out properly, there’s a non-negligible chance they’ll get an abortion, which is even worse for gene propagation. Furthermore, parents are to some extent legally responsible for their children’s actions, so having too many poorly-regulated kids running around means exposing yourself to liability. A big part of the optimal strategy for present-day long-term reproductive success is to get rich, and a big part of getting rich is not having more kids than you can keep track of.
I think that’s a retcon. People use contraception so they can have more sex than they would if they had to worry about having kids every time. They may or may not rationalise further, I suspect that generally they don’t.
In terms of genetic success, having more kids than you can keep track of is pretty much the ideal, as long as all or at least most survive to reproductive adulthood.
But some people consciously choose never to have any kids. That’s silly from the perspective of gene propagation if anything is.
Sure it does. A devout priest spends half his life celibate and serving God. One day he has a crisis, reads a bunch of stuff on the internet and suddenly realizes he doesn’t believe in God. His values change.
Even this is questionable. I suspect any concept of universal morality must be evolutionary. This certainly is a widespread concept in systems/transhumanist/singularitan/cosmist thought. We do value evolution in and of itself.
It’s probably possible in principle to build such an AI—it would probably need some sort of immutable hard-coded paperclip recognition module which it could evaluate potential simulated futures generated from the more complex general intelligence system.
If such a thing developed to a human level or beyond and could reflect on it’s cognition, it may explain in lucid detail how futures filled with paperclips were good and others were evil.
It could even understand that it’s concepts of morality and good/bad were radically different than those of humans, and it would even understand that this difference relates to it’s hard coded paper-clip recognizer, and it would explain in detail how this architecture was superior to human value systems .. because it helped to maximize expected future paperclips.
It could even write books such as “Paperclip Morality: the Truth”.
But just because such a thing is possible in principle doesn’t make it the slightest bit likely.
If you can build an AGI that can understand human language, it would be much easier and considerably more effective to make the AGI’s goal system dynamically modifiable on reflection through human language.
Instead of having a special hard-coded circuit to evaluate the utility of potential futures, you could just have the general conceptual circuitry handle this. The concept of ‘good’ would still be somewhat special in it’s role in the goal system itself, but the ‘goodness recognizer’ could change and evolve over time.
Well, the counter-argument to that particular example would be that the priest’s belief in God wasn’t a terminal value; rather, their goals of being happy and helping other people and understanding the universe were. Believing in and obeying God were just instrumental values.
However, agreed that there’s nothing in particular forcing people, weird and funky and clunky as our minds are, from always having the same fixed terminal values either. To pick an extreme example, peoples’ brains can sometimes be messed up severely by hormonal imbalances, which can in turn cause people to do such drastically anti-own-terminal-value things as committing suicide.
I should’ve been more specific and just said that, in general, understanding evolutionary psychology never or only very rarely causes peoples’ terminal values to change.
Human morality is a product of evolution; however, our morality is not itself an evolutionary algorithm execution mechanism. It’s kind of a vague approximation of one (in that all the moralities that sucked for fitness were selected against), but it still often leads to drastically different results than a straight-up evolutionary fitness maximization algorithm with access to our brains’ resources would.
For example: I intend never to have biological children and consider this decision to be a moral one. However, from an evolutionary perspective, deliberately preventing my own genes from propagating is just plain silly.
Yes, the paper-clip maximizer is just a whimsical example. However, similarly Really Unfriendly optimizers are quite plausible. Imagine the horrors that could result from a naive human-happiness-maximizer hitting the singularity asymptote.
Yes, that would be important, but it still wouldn’t be enough to solve the problem; in fact, the really hard part of the problem still remains! The happiness-maximizer might base its understanding of happiness on descriptive human usage of the word, and end up with a truly thorough and consistent understanding of the word… and then still turn everybody into nearly mindless wireheads.
Our morality engines and our language aren’t properly tuned for dealing with the kind of reality-bending power a superintelligent entity would have.
You may not have liked that particular example, but I think you are in agreement that terminal values change.
Just to make sure though, a few more examples:
someone who likes chocolate ice cream and then some years later prefers vanilla instead
someone who likes impressionist art but then years later prefers post-modern
someone who likes cats more than dogs
someone who likes chinese culture more than ethiopian
I don’t find such maximizers significantly plausible at even the human level intelligence. Possible in principle? Sure. But if you look at realistic, plausible routes to AGI it becomes clear that an AGI necessarily will be programmed in human languages and will pick up human cultural programs.
And finally, even if it was plausible that a flawed design could hit the singularity asymptote, that itself might only be a big problem if it had a short planning horizon.
It seems that all superintelligences with infinite planning horizons become behaviorally indistinguishable. All long-term value systems converge on a single universal attractor—they become cosmists.
That is what I mean when I said “any concept of universal morality must be evolutionary”.
That would again be assuming humans capable of building a superhuman AGI but asinine enough to attempt to somehow hardcode it’s goal system, instead of making it open-ended dynamic as a human’s.
How would you build a happiness maximizer and fix the value of happiness? The meaning of a word in a human brain is stored as huge set of associate weights that anchor it in a massive distributed belief network. The exact meaning of each word changes over time as the network learns and reconfigures itself—no concept is quite static. So for an AGI to understand the word in the same way we do, the word’s meaning is always subject to some drift. And this is a good thing.
I think we may be experiencing some terminology confusion here. Just to be clear, you realize that these are all not terminal values, right?
Here’s the big issue: if it’s open-ended, how do we keep it from drifting off somewhere terrible? The system that guides that seems to be the largest potential risk point of the approach you describe.
I’m very confused by this; can you go into more detail about why you think this is so? In particular, why would it be true for all long-term value systems (including flawed and simplistic value systems), and not just a very small subset?
No. What is a terminal value? That which stimulates the planning reward circuit in the human nucleus accumbens? I’m not sure I buy into the concept.
The point of value or preferences from the perspective of intelligence is to rate potential futures.
We are open-ended! Our future-preferences depend on and our intertwined with our knowledge. So any superintelligence or evolutionary accelerator we create will also need to be open-ended, or it wouldn’t be protecting our dynamic core.
I discussed some of this in my first, somewhat hasty, LW post here. A few others here have mentioned a similar idea, I may write more about it as I find it interesting.
Basically, if your planning horizon extends to infinity you will devote all of your resources towards expanding your net intelligence for the long term future, regardless of what your long term goals are.
So no matter whether your long term goal is to maximize paper-clips, human happiness or something more abstract, in each case this leads to an identical outcome for the foreseeable future: a local computational singularity with an exponentially expanding simulated metaverse.
There is some speculation within physics that black hole like singularities can create new physical universes through inflation. If this is true than the long term goals of a superintelligence are best served by literally creating new physical multiverses that have more of the desirable space-time properties.
No, what I’m referring to is also known as an intrinsic value. It’s a value that is valuable in and of itself, not in justification for some other value. A non-terminal value is commonly referred to as an instrumental value.
For example, I value riding roller-coasters, and I also value playing Dance Dance Revolution. However, those values are expressible in terms of another, deeper value, the value I place on having fun. That value may in turn be thought of as an instrumental value of a yet deeper value: the value I place on being happy moment-to-moment.
If you were going to implement your own preference function as a Turing machine, trying to keep the code as short as possible, the terminal values would be the things that machine would value.
Okay, I see where you’re coming from. However, from a human perspective, that’s still a pretty large potential target range, and a large proportion of it is undesirable.
From the deeper perspective of computational neuroscience, the intrinsic/instrumental values reduce to cached predictions of your proposed ‘terminal value’ (being happy moment-to-moment), which reduces to various types of stimulations of the planning reward circuitry.
Labeling the experience of chocolate ice cream as an ‘instrumental value’ and the resulting moment-to-moment happiness as the real ‘terminal value’ is a useless distinction—it then collapses your terminal values down to the singular of ‘happiness’ and relabels everything worthy of discussion as ‘instrumental’.
The quality of being happy moment-to-moment is anything but a single value and should not by any means be reduced to a single concept. It is a vast space of possible mental stimuli, each of which creates a unique conscious experience.
The set of mental states encompassed by “being happy moment-to-moment moment-to-moment” is vast: the gustatory pleasure of eating chocolate ice cream, the feeling of smooth silk sheets, the release of orgasm, the satisfaction of winning a game of chess, the accomplishment of completing a project, the visual experience of watching a film, the euphoria of eureka, all of these describe entire complex spaces of possible mental states.
Furthemore, the set of possible mental states is forever dynamic, incomplete, and undefined. The set of possible worlds that could lead to different visual experiences, as just a starter example, is infinite, and each new experience or piece of knowledge itself changes the circuitry underlying the experiences and thus changes our values.
The simplest complete turing machine implementation of your preference function is an emulation of your mind. It is you, and it has no perfect simpler equivalent (although many imperfect simulations are possible).
The core of the cosmist idea is that for any possible goal evaluator with an infinite planning horizon, there is a single convergent optimal path towards that goal system. So no, the potential target range in theory is not large at all—it is singularly narrow.
As an example, consider a model universe consisting of a modified game of chess or go. The winner of the game is then free to arrange the pieces on the board in any particular fashion (including the previously dead pieces). The AI’s entire goal is to make some particular board arrangement - perhaps a smily face. For any such possible goal system, all AI’s play the game exactly the same at the limits of intelligence—they just play optimally. Their behaviour doesn’t differ in the slightest until the game is done and they have won.
Whether the sequence of winning moves such a god would make on our board is undesirable or not from our current perspective is a much more important, and complex, question.
Right, but as far as I can tell without having put lots of hours into trying to solve the problem of clippyAI, it’s really damn hard to precisely specify a paperclip. (There are things that are easier to specify that this argument doesn’t apply to and that are more plausibly dangerous, like hyperintelligent theorem provers...) Thus in trying to figure out what it’s utility function actually is (like what humans are doing as they introspect more) it could discover that the only reason its goal is (something mysterious like) ‘maximize paperclips’ is because ‘maximize paperclips’ was how humans were (probabilistically inaccurately) expressing their preferences in some limited domain. This is related to the theme Eliezer quite elegantly goes on about in Creating Friendly AI and that he for some reason barely mentioned in CEV, which is that the AI should look at its own source code as evidence of what its creators were trying to get at, and update its imperfect source code accordingly. Admittedly, most uFAIs probably won’t be that sophisticated, and so worrying about AI-related existential risks is still definitely a big deal. We just might want to be a little more cognizant of potential motivations for people who disagree with what has recently been dubbed SIAI’s ‘scary idea’.
Hm. I suppose that’s possible, though it would require that the AI be given a utility function that’s specifically meant to be amenable to that kind of revision.
Under the most straightforward (i.e. not CEV-style) utility function design, fuzziness in its definition of “paperclip” would just drive the paperclip-maximizer to choose the possible definition that yields the highest utility score.
To pick a different silly example, a dog-maximizer with a utility function based on the number of dogs in the universe would simply prefer to tile the solar system with tiny Chihuahas rather than Great Danes; the whole range of “dog” definitions fit the function, so it just chooses the one that is most convenient for maximum utility. It wouldn’t try to resolve it by trying to decide which definition is more in line with the designer’s ideals, unless “consider the designer’s ideals” were designed into the system from the start.
Is designing “consider the designer’s ideals” in an AI difficult?
Currently expected to be difficult, since we don’t know of an easy way to do so. That it’ll turn out to be easy (in the hindsight) is not totally out of the question.
Has anyone considered approaching this problem in the same way we might approach “read the user’s handwriting”? That is, the task is not one we program the AI to accomplish—instead, we train the AI to accomplish it. And, most importantly, we train the AI to ask for further clarification in ambiguous cases.
Mirrors and Paintings (yes, you want to point your program at the world and have it figure out what you referred to), The Hidden Complexity of Wishes (if you need to answer AI’s question or give it instructions, you’re doing something wrong and it won’t work).
I have to admit, as someone who has worked in software testing, I find it difficult to take the suggestion (non-destructive full-brain scan) in the first link very seriously. How, exactly, do I become convinced that the AI can come to know more about what I want by scanning me than I can know by introspection? How can I (or it) even do a comparison between the two without it asking me questions?
But then we get down to doing the comparison. The AI informs me that what I really want is to kill my father and sleep with my mother. I deny this. Do we take this as evidence that the AI really does know me better than I know myself, or as a symptom of a bug?
I would argue that if you don’t need to answer the AI’s questions or give it instructions, you’re doing something wrong and it won’t work. By definition. At least for the first ten thousand scans or so. And even then there will remain questions on which the AI and introspection would deliver different answers. Questions with hidden complexity. I just don’t see how anyone would trust a CEV extrapolated from brain scans until we had decades of experience suggesting that scanning and modeling yields better results than introspection.
Agreed. And any useful AI will have to understand human language to do or learn much anything of value.
The detailed analysis of full brain scanning tech I’ve seen puts it far into the future, well beyond human-level AGI.
You have to make sure AI predictably gives a better answer even on questions where you disagree. And there will be questions which can’t even be asked of a human.
Irrelevant. Assume you magically have a perfect working simulation of yourself.
Why would I want to do that? I.e. how would making that assumption lead me to take Eliezer’s suggestion more seriously? My usual practice is to take things less seriously when magic is involved.
And how does this assumption interact with your other comment stating that I have to make sure the AI is somehow even better than myself if there is any difference between simulation and reality? Haven’t you just asked me to assume that there are no differences?
Sorry, I simply don’t understand your responses, which suggests to me that you did not understand my comment. Did you notice, in my preamble, that I mentioned software testing? Perhaps my point may be clearer to you if you keep this preamble in mind when formulating your responses.
Because that’s a conceptually straightforward assumption that we can safely make in a philosophical argument.
The upload is not the AI (and Eliezer’s post doesn’t refer to uploads IIRC, but for the sake of the argument assume they are available as raw material). You make AI correct on strong theoretical grounds, and only test things to check that theoretical assumptions hold in ways where you expect it to be possible to check things, not in every situation.
What would I need to make of that?
But this is not a philosophical argument.
To recap:
I suggested that an AI which is a precursor to the FAI should come to understand human values by interacting (over an extended ‘training’ period) with actual humans—asking them questions about their values and perhaps performing some experiments as in a psych or game theory laboratory.
You responded by linking to this, which as I read it suggests that the most accurate and efficient way to extract the values of a human test subject would be by carrying out a non-destructive brain scan. Quoting the posting:
I asked how we could possibly come to know by testing that the scanning and brain modeling was working properly. I could have asked instead how we could test the hypothesis that the inference from behavior was working properly.
These are questions about engineering and neuroscience, not questions of philosophy. The question of what is right/wrong is a philosophical question. The question of what do humans believe about right and wrong is a psychology question. The question of how those beliefs are represented in the brain is a neuroscience question. The question of how an AI can come to learn these things is GOFAI. The question of how we will know we have done it right is a QC question. Software test. That was the subject of my comment. It had nothing at all to do with philosophy.
Ok, in this context, I interpret this to mean that we will not program in the neuroscience information that it will use to interpret the brain scans. Instead we will simply program the AI to be a good scientist. A provably good scientist. Provable because it is a simple program and we understand epistemology well enough to write a correct behavioral specification of a scientist and then verify that the program meets the specification. So we can let the AI design the brain scanner and perform the human behavioral experiments to calibrate its brain models. We only need to spot-check the science it generates, because we already know that it is a good scientist.
Hmmm. That is actually a pretty good argument, if that is what you are suggesting. I’ll have to give that one some thought.
Sorry, not my area at the moment. I gave the links to refer to arguments for why having AI learn in the traditional sense is a bad idea, not for instructions on how to do it correctly in a currently feasible way. Nobody knows that, so you can’t expect an answer, but the plan of telling the AI things we think we want it to learn is fundamentally broken. If nothing better can be done, too bad for humanity.
This is much closer, although a “scientist” is probably a bad word to describe that, and given that I don’t have any idea what kind of system can play this role, it’s pointless to speculate. Just take as the problem statement what you quoted from the post:
Relevant—Can we just assume you magically have a friendly AI then?
If the plan for creating a friendly AI depends on a non-destructive full-brain scan already being available, the odds of achieving friendly AI before other forms of AI vanish to near zero.
One step at a time, my good sir! Reducing the philosophical and mathematical problem of Friendly AI to the technological problem of uploading would be an astonishing breakthrough quite by itself.
I think this reflects the practical problem with Friendly AI—it is an ideal of perfection taken to an extreme that expands the problem scope far beyond what is likely to be near term realizable.
I expect that most of the world, research teams, companies, the VC community and so on will be largely happy with an AGI that just implements an improved version of the human mind.
For example, humans have an ability to model other agents and their goals, and through love/empathy value the well-being of others as part of our own individual internal goal systems.
I don’t see yet why that particular system is difficult or more complex than the rest of AGI.
It seems likely that once we can build an AGI as good as the brain we can build one that is human-like but only has the love/empathy circuitry in it’s goal system with the rest of the crud stripped out.
In other words if we can build AGI’s modeled after the best components of the best examples of altruistic humans, this should be quite sufficient.
This is the straightforward approach.
Once you have an AGI that has the cognitive capability and learning capacity of a human infant brain, you teach it everything else in human language—right/wrong, ethics/morality, etc.
Programming languages are precise and well suited for creating the architecture itself, but human languages are naturally more effective for conveying human knowledge.
I tend to agree that we need a natural language interface to the AI. But it is far easier to create automatic proofs of program correctness when the really important stuff (like ethics) is presented in a formal language equipped with a deductive system.
There is something to be said for treating all the natural language input as if it were testimony from unreliable witnesses—suitable, perhaps, for locating hypotheses, but not really suitable as strong evidence for accepting the hypotheses.
I’m not sure how this applies—can you formally prove the correctness of a probabilistic belief network? Is that even a valid concept?
I can understand how you can prove a formal deterministic circuit or the algorithms underlying the belief network and learning systems, but the data values?
Agree. That is why I suggest that the really important stuff—meta-ethics, epistemology, etc., be represented in some other way than by ‘neural’ networks. Something formal and symbolic, rather than quasi-analog. All the stuff which we (and the AI) need to be absolutely certain doesn’t change meaning when the AI “rewrites its own code”
By formal, I assume you mean math/code.
The really important stuff isn’t a special category of knowledge. It is all connected—a tangled web of interconnected complex symbolic concepts for which human language is a natural representation.
What is the precise mathematical definition of ethics? If you really think of what it would entail to describe that precisely, you would need to describe humans, civilization, goals, brains, and a huge set of other concepts.
In essence you would need to describe an approximation of our world. You would need to describe a belief/neural/statistical inference network that represented that word internally as a complex association between other concepts that eventually grounds out into world sensory predictions.
So this problem—that human language concepts are far too complex and unwieldy for formal verification—is not a problem with human language itself that can be fixed by using other language choices. It reflects a problem with the inherit massive complexity of the world itself, complexity that human language and brain-like systems are evolved to handle.
These folks seem to agree with you about the massive complexity of the world, but seem to disagree with you that natural language is adequate for reliable machine-based reasoning about that world.
As for the rest of it, we seem to be coming from two different eras of AI research as well as different application areas. My AI training took place back around 1980 and my research involved automated proofs of program correctness. I was already out of the field and working on totally different stuff when neural nets became ‘hot’. I know next to nothing about modern machine learning.
I’ve read about CYC a while back—from what I recall/gather it is a massive handbuilt database of little natural language ‘facts’.
Some of the new stuff they are working on with search looks kinda interesting, but in general I don’t see this as a viable approach to AGI. A big syntactic database isn’t really knowledge—it needs to be grounded to a massive sub-symbolic learning system to get the semantics part.
On the other hand, specialized languages for AGI’s? Sure. But they will need to learn human languages first to be of practical value.
Blind men looking at elephants.
You look at CYC and see a massive hand-built database of facts.
I look and see a smaller (but still large) hand-built ontology of concepts
You, probably because you have worked in computer vision or pattern recognition, notice that the database needs to be grounded in some kind of perception machinery to get semantics.
I, probably because I have worked in logic and theorem proving, wonder what axioms and rules of inference exist to efficiently provide inference and planning based upon this ontology.
One of my favorite analogies and I’m fond of the Jainist? multi-viewpoint approach.
As for the logic/inference angle, I suspect that this type of database underestimates the complexity of actual neural concepts—as most of the associations are subconscious and deeply embedded in the network.
We use ‘connotation’ to describe part of this embedding concept, but I see it as even deeper than that. A full description of even a simple concept may be on the order of billions of such associations. If this is true, then a CYC like approach is far from appropriately scalable.
It appears that you doubt that an AI whose ontology is simpler and cleaner than that of a human can possibly be intellectually more powerful than a human.
All else being equal, I would doubt that with respect to a simpler ontology, while the ‘cleaner’ adjective is less well defined.
Look at it in terms of the number of possible circuit/program configurations that are “intellectually more powerful than a human” as a function of the circuit/program’s total bit size.
At around the human level of roughly 10^15 I’m almost positive there are intellectually more powerful designs—so P_SH(10^15) = 1.0.
I’m also positive that beyond some threshold there are absolutely zero possible configurations of superhuman intellect—say P_SH(10^10) ~ 0.0.
Of course “intellectually more powerful” is open to interpretation. I’m thinking of it here in terms of the range of general intelligence tasks human brains are specially optimized for.
IBM’s Watson is superhuman in a certain novel narrow range of abilities, and it’s of complexity around 10^12 to 10^13.
To get to that point we have to start from the right meaning to begin with, and care about preserving it accurately, and Jacob doesn’t agree those steps are important or particularly hard.
Not quite.
As for the start with the right meaning part, I think it is extremely hard to ‘solve’ morality in the way typically meant here with CEV or what not.
I don’t think that we need (or will) wait to solve that problem before we build AGI, any more or less than we need to solve it for having children and creating a new generation of humans.
If we can build AGI somewhat better than us according to our current moral criteria, they can build an even better successive generation, and so on—a benevolence explosion.
As for the second part about preserving it accurately, I think that ethics/morality is complex enough that it can only be succinctly expressed in symbolic associative human languages. An AGI could learn how to model (and value) the preferences of others in much the same way humans do.
Someone help me out. What is the right post to link to that goes into the details of why I want to scream “No! No! No! We’re all going to die!” in response to this?
Coming of Age sequence examined realization of this error from Eliezer’s standpoint, and has further links.
In which post? I’m not finding discussion about the supposed danger of improved humanish AGI.
That Tiny Note of Discord, say. (Not on “humanish” AGI, but eventually exploding AGI.)
I don’t see much of a relation at all to what i’ve been discussing in that first post.
[http://lesswrong.com/lw/lq/fake_utility_functions/] is a little closer, but still doesn’t deal with human-ish AGI.
Why would an AI which optimises for one thing create another AI that optimises for something else? Not every change is an improvement, but every improvement is necessarily a change. Building an AI with a different utility function is not going to satisfy the first AI’s utility function! So whatever AI the first one builds is necessarily going to either have the same utility function (in which case the first AI is working correctly), or have a different one (which is a sign of malfunction, and given the complexity of morality, probably a fatal one).
It’s not possible to create an AGI that is “somewhat better than us” in the sense that it has a better utility function. To the extent that we have a utility function at all, it would refer to the abstract computation called “morality”, which “better” is defined by. The most moral AI we could create is therefore one with precisely that utility function. The problem is that we don’t exactly know what our utility function is (hence CEV).
There is a sense in which a Friendly AGI could be said to be “better than us”, in that a well-designed one would not suffer from akrasia and whatever other biases prevent us from actually realizing our utility function.
AI’s without utility functions, but some other motivational structure, will tend to self-improve to a utility function AI. Utility-function AI’s seem more stable under self-improvement, but there are many reasons it might want to change its utility (eg speed of access, multi-agent situations).
Could you clarify what you mean by an “other motivational structure?” Something with preference non-transitivity?
For instance. http://selfawaresystems.files.wordpress.com/2008/01/ai_drives_final.pdf
It wouldn’t if it initially considered itself to be the only agent in the universe. But if it recognizes the existence of other agents and the impact of other agents’ decisions on its own utility, then there are many possibilities:
The new AI could be created as a joint venture of two existing agents.
The new AI could be built because the builder was compensated for doing so.
The new AI could be built because the builder was threatened into doing so.
This may seem intuitively obvious, but it is actually often false in a multi-agent environment.
Yes it certainly can—if that new AI helps it’s creator.
The same issue applies to children—they don’t necessarily have the same ‘utility function’, sometimes they even literally kill us, but usually they help us.
Sure it is—this part at least is easy. For example an AGI that is fully altruistic and only experiences love as it’s single emotion would be clearly “somewhat better than us” from our perspective in every sense that matters.
If that AGI would not be somewhat better than us in the sense of having a better utility function, then ‘utility function’ is not a useful concept.
The real problem is the idea that morality can or should be simplified down to a ‘utility function’ simple enough for a human to code.
Before tackling that problem, it would probably best to start with something much simpler, such as a utility function that could recognize dogs vs cats and other objects in images. If you actually research this it quickly becomes clear that real world intelligences make decisions using much more complexity than a simple utility-maximizing algorithm.
That would be not so much a benevolence explosion as a single AI creating “slave” AIs for its own purposes. If some of the child AI’s goals (for example those involved in being more good) are opposed to the parent’s goals (for example those which make the parent AI less good), the parent is not going to just let the child achieve its goals. Rational agents do not let their utility functions change.
If you mean that the AI doesn’t suffer from the akrasia and selfishness and emotional discounting and uncertainty about our own utility function which prevents us from acting out our moral beliefs then I agree with you. That’s the AI being more rational than us, and therefore better optimising for its utility function. But a literally better utility function is impossible, given that “better” is defined by our utility function.
Moreover, if our utility function describes what we truly want (which is the whole point of a utility function), it follows that we truly want an AI that optimizes for our utility function. If “better” were a different utility function then it would be unclear why we are trying to create an AI that does that, rather than what we want.
That’s why the plan is for the AI to figure it out by inspecting us. Morality is very much not simple to code.
So do we create children as our ‘slaves’ for our own purposes? You seem to be categorically ruling out the entire possibility of humans creating human-like AIs that have a parent-child relationship with their creators.
So just to make it precisely clear, I’m talking about that type of AI specifically. The importance and feasibility of that type of AGI vs other types is a separate discussion.
I don’t see it as having anything to do with rationality.
The altruistic human-ish AGI mentioned above would be better than current humans from our current perspective—more like what we wish ourselves to be, and more able to improve our world than current humans.
Yes.
This is obvious if it’s ‘utility function’ is just a projection of my own—ie it simulates what I would want and uses that as it’s utility function, but that isn’t even necessary—it’s utility function could be somewhat more complex than just a simulated projection of my own and still help fulfill my utility function.
If by inspection you just mean teach the AI morality in human language, then I agree, but that’s a side point.
So: I want to finish my novel, but I spend the day noodling around the Internet instead.
Then Omega hands me an AI which it assures me is programmed error-free to analyze me and calculate my utility function and optimize my environment in terms of it.
I run the AI, and it determines exactly which parts of my mind manifest a desire to finish the novel, which parts manifest a desire to respond to the Internet, and which parts manifest a desire to have the novel be finished. Call them M1, M2 and M3. (They are of course overlapping sets.) Then it determines somehow which of these things are part of my utility function, and which aren’t, and to what degree.
So...
Case 1: The AI concludes that M1 is part of my utility function and M2 and M3 are not. Since it is designed to maximize my utility, it constructs an environment in which M1 triumphs. For example, perhaps it installs a highly sophisticated filter that blocks out 90% of the Internet. Result: I get lots more high-quality work done on the novel. I miss the Internet, but the AI doesn’t care, because that’s the result of M2 and M2 isn’t part of my utility function.
Case 2: The AI concludes that M3 and M2 are part of my utility function and M1 is not, so it finishes the novel itself and modifies the Internet to be even more compelling. I miss having the novel to work on, but again the AI doesn’t care.
Case 3: The AI concludes that all three things are part of my utility function. It finishes the novel but doesn’t tell me about it, thereby satisfying M3 (though I don’t know it). It makes a few minor tweaks to my perceived environment, but mostly leaves them alone, since it is already pretty well balanced between M1 and M2 (which is not surprising, since I was responding to those mental structures when I constructed my current situation).
If I’m understanding you correctly, you’re saying that I can’t really know which of these results (or of countless other possibilities) will happen, but that whichever one it is, I should have high confidence that all other possibilities would by my own standards have been worse… after all, that’s what it means to maximize my utility function.
Yes?
It seems to follow that if the AI has an added feature whereby I can ask it to describe what it’s about to do before it does it and then veto doing it, I ought not invoke that feature. (After all, I can’t make the result better, but I might make the result worse.)
Yes?
Assuming you trust Omega to mean the same thing as you do when talking about your preferences and utility function, then yes. If the AI looks over your mind and optimizes the environment for your actual utility function (which could well be a combination of M1, M2 and M3), then any veto you do must make the result worse than the optimal one.
Of course, if there’s doubt about the programming of the AI, use of the veto feature would probably be wise, just in case it’s not a good genie.
You seem to be imagining a relatively weak AI. For instance, given the vast space of possibilities, there are doubtlessly environmental tweaks that would result in more fun on the internet and more high-quality work on the novel. (This is to say nothing of more invasive interventions.)
The answer to your questions is yes: assuming the AI does what Omega says it does, you won’t want to use your veto.
Not necessarily weak overall, merely that it devotes relatively few resources to addressing this particular tiny subset of my preference-space. After all, there are many other things I care about more.
But, sure, a sufficiently powerful optimizer will come up with solutions so much better that it will never even occur to me to doubt that all other possibilities would be worse. And given a sufficiently powerful optimizer, I might as well invoke the preview feature if I feel like it, because I’ll find the resulting preview so emotionally compelling that I won’t want to use my veto.
That case obscures rather than illustrates the question I’m asking, so I didn’t highlight it.
Case 4: The AI makes tweaks to your current environment in order to construct it in accordance with your mental structures, but in a way more efficient than you could have in the first place.
Sure. In which case I still noodle around on the Internet a bunch rather than work on my novel, but at least I can reassure myself that this optimally reflects my real preferences, and any belief I might have that I would actually rather get more work done on my novel than I do is simply an illusion.
If those are, in fact, your real preferences, then sure.
I occasionally point out that you can model any computable behaviour using a utility-maximizing algorithm, provided you are allowed to use a partially-recursive utility function.
Please read the sequences, and stop talking about AI until you do.
I’ve read the sequences. Discuss or leave me alone.
Thanks, that’s useful to know.
Edit: Seriously, no irony, that’s useful. Disagreement should be treated differently depending on background.
Also, very little of the sequences have much of anything to do with AI. If I want to learn more about that I would look to Norvig’s book or more likely the relevant papers online. No need to be rude just because I don’t hold all your same beliefs.
It’s more of a problem with your understanding of ethics, as applied to AI (and since this is the main context in which AI is discussed here, I referred to that as simply AI). You might be very knowledgeable in contemporary machine learning or other AI ideas while not seeing, for example, the risks of building AGIs.
Unfortunately there is (in some senses of “rude”, such as discouraging certain conversational modes).
I see the potential risks in building AGIs.
I don’t see that risk being dramatically high for creating AGIs based loosely on improving the human brain, and this approach appears to be mainstream now or becoming the mainstream (Kurzweil, Hawkins, Darpa’s neuromorphic initiative, etc).
I’m interested in the serious discussion or analysis of why that risk could be high.
You have been discussing favourably the creation of AGIs that are programmed to create AGIs with different values to their own. No, you do not understand the potential risks.
We create children that can have different values than our own, and over time this leads to significant value drift. But perhaps it should be called ‘value evolution’.
This process is not magically guaranteed to preserve our best interests from our current perspective when carried over to AGI, but nor is guaranteed to spontaneously destroy the world.
Your analogy with evolution is spot on: if the values are going to drift at all, we want to drift towards some target point, by selecting against sub-AIs that have values further from the point.
However, if we can do that, why not just put that target point right in the first AI’s utility function, and prevent any value drift at all? It seems like it ends up with the same result, but with slightly less complication.
And, if we can’t set a target point for the value drift evolution… then it might drift anywhere at all! The chances that it would drift somewhere we’d like are pretty small. This applies even if it were a human-brain-based AGI; in general people are quite apt to go corrupt when given only a tiny bit of extra power. A whole load of extra power, like superintelligence would grant, would have a good chance of screwing with that human’s values dramatically, possibly with disastrous effects.
Yes.
The true final ‘target point’ is unknown, and unknowable in principle. We don’t have the intelligence/computational power right now to know it, no AGI we can build will know it exactly, and this will forever remain true.
Our values are so complex that the ‘utility function’ that describes them is our entire brain circuit—and as we evolve into more complex AGI designs our values will grow in complexity as well.
Fixing them completely would be equivalent to trying to stop evolution. It’s pointless, suicidal, impossible.
Yes evolution could in principle take us anywhere, but we can and already do exert control over it’s direction.
Humans today have a range of values, but an overriding universal value is not-dying. To this end it is crucially important that we reverse engineer the human mind.
Ultimately if what we really value is conscious human minds, and computers will soon out-compete human brains, then clearly we need to transfer human minds over to computers.
One simple point is that there is no reason to expect AGIs to stop at exactly human level. Even if progress and increase in intelligence is very slow, eventually they become an existential risk, or at least a value risk. Every step in that direction we make now is a step in the wrong direction, which holds even if you believe it’s a small step.
This isn’t the first time I heard this, but I don’t think it’s exactly right.
We know that human level is possible, but while super human level being possible seems overwhelmingly likely from considerations like imagining a human with more working memory and running faster we don’t technically know that.
We have a working example of a human level intelligence.
It’s human level intelligences doing the work. Martians work on AI might asymptotically slow down when approaching martian level intelligence without that level being inherently significant for anyone else, and the same for humans, or any AGI of any level working on its own successor for that matter (not that I have any strong belief that this is the case, it’s just an argument for why human level wouldn’t be completely arbitrary as a slow down point)
I’d completely agree with “there is no strong reason to expect AGIs to stop at exactly human level”, “High confidence* in AGIs stopping at exactly human level is irrational” or “expecting AGIs not to stop at exactly human level would be prudent.”
*Personally I’d assign a probability of under 0.2 to the best AGI’s being on a level roughly comparable to human level (let’s say being able to solve any problem except human relationship problems that every IQ 80+ human can solve, but not being better at every task than any human) for at least 50 years (physical time in Earth’s frame of reference, not subjective time; probably means inferior at an equal clock rate but making up for that with speed for most of that time). That’s a lot more than I would assign any other place on the intelligence scale of course.
Could the downvoter please say what they are disagreeing with? I can see at least a dozen mutually contradictory possible angles so “someone thinks something about posting this is wrong” provides almost no useful information.
Thanks for the value risk link—that discussion is what I’m interested in.
I guess I’ll reply to it there. The initial quotes from Ben G. and Hanson are similar to my current view.
There is some discussion of the dangers of a uFAI Singularity, particularly in this debate between Robin Hanson and Eliezer. Much of the danger arises from the predicted short time period required to get from a mere human-level AI to a superhuman AI+. Eliezer discusses some reasons to expect it to happen quickly here and here. The concept of a ‘resource overhang’ is crucial in dismissing Robin’s skepticism (which is based on historical human experience in economic growth—particularly in the accumulation of capital).
For an analysis of the possibility of a hard takeoff in approaches to AI based loosely on modeling or emulating the human brain, see this posting by Carl Schulman, for example.
If civilisation(t+1) can access resources much better than civilisation(t), then that is just another way of saying things are going fast—one must beware of assuming what one is trying to demonstrate here.
The problem I see with this thinking is the idea that civilisation(t) is a bunch of humans while civilisation(t+1) is a superintelligent machine.
In practice, civilisation(t) is a man-machine symbiosis, while civilisation(t+1) is another man-machine symbiosis with a little bit less man, and a little bit more machine.
There are some promising lines of attack (grounded in decision theory) that might take only a few years of research. We’ll see where they lead. Other open problems in FAI might start looking very solvable if we start making progress on this front.
Show me.
PM’d.
Yes. :)
Yes, but it still has to be explicitly programmed to do that! The question is how to get it to do so. AFAIK shaper-anchor semantics is still quite a ways from being fully specified, but it seems the bigger obstacle is that an AI writer is less likely than not to take the effort to program it that way in the first place.
This is surely the kind of thing that superintelligences will be good at. They will have access to every paperclip picture on the net, every paperclip specification too. They will surely have a much clearer idea about what a paperclip is than humans do. They will know what boxes are too.
I made a stab at it here, and it got some upvotes. So here’s a repost:
Make a wire, 10 cm long and 1mm in diameter, composed of an alloy of 99.8% iron and 0.2% carbon. Start at one end and bend it such that the segments from 2-2.5cm, 2.75-3.25cm, 5.25-5.75cm form half-circles, with all the bends in the same direction and forming an inward spiral (the end with the first bend is outside the third bend).
(Please let me know if reposting violates LW ettiquette so I know not to do it again.)
I don’t think it violates LW etiquette.
Here’s a sort of fully general counterargument against proposals to naturalize human concepts in AI terms: if you can naturalize human concepts, you should be able to naturalize the human concept of a box. And if you can do that, we can build Oracle AI and save the world. It’s very easy to describe what we mean by ‘stay in the box’, but it turns out that seed (self-modifying!) AIs just don’t have a natural ontology for the descriptions.
This argument might be hella flawed; it seems kind of tenuous.
Aren’t you simply assuming that the world is doomed here? It sure looks like it!
Since when is that assumption part of a valid argument?
That assumption isn’t really a core part of the argument… the general “if specifying human concepts is easy, then come up with a plan for making a seed AI want to stay in a box” argument still stands, even if we don’t actually want to keep arbitrary seed AIs in boxes.
For the record I am significantly less certain than most LW or SIAI singularitarians that seed AIs not explicitly coded with human values in mind will end up creating a horrible future, or at least a more horrible future than something like CEV. I do think it’s worth a whole lot of continued investigation.