I agree with your analysis, though it’s not clear to me what you think of the 1% estimate. I think the 1% estimate is probably two to three orders of magnitude too high and I think the cost of the Scary Idea belief is structured as both a finite loss and an infinite loss, which complicates the analysis in a way not considered. (i.e. the error you see with a Pascal’s mugging is present here.)
For example, I am not particularly tied to a human future. I would be willing to create an AGI in any of the following three situations, ordered from most preferred to least: 1) it is friendly to humans, and humans and it benefit from each other; 2) it considers humans a threat, and destroys all of them except for me and a few tame humans; I spend the rest of my days growing cabbage with my hands; 3) it considers all humans a threat, and destroys them all, including me.
A problem with believing the Scary Idea is it makes it more probable that I beat you to making an AGI; particularly with existential risks, caution can increase your chance of losing. (One cautious way to deal with global warming, for example, is to wait and see what happens.)
So, the Scary Idea as I’ve seen it presented definitely privileges a hypothesis in a troubling way.
I think you’re making the unwarranted assumption that in scenario (3), the AGI then goes on to do interesting and wonderful things, as opposed to (say) turning the galaxy into a vast computer to calculate digits of pi until the heat death of the universe stops it.
You don’t even see such things as a possibility, but if you programmed an AGI with the goal of calculating pi, and it started getting smarter… well, the part of our thought-algorithm that says “seriously, it would be stupid to devote so much to doing that” won’t be in the AI’s goal system unless we’ve intentionally put something there that includes it.
I think you’re making the unwarranted assumption that in scenario (3), the AGI then goes on to do interesting and wonderful things, as opposed to (say) turning the galaxy into a vast computer to calculate digits of pi until the heat death of the universe stops it.
So, I think it’s a possibility. But one thing that bothers me about this objection is that an AGI is going to be, in some significant sense, alien to us, and that will almost definitely include its terminal values. I’m not sure there’s a way for us to judge whether or not alien values are more or less advanced than ours. I think it strongly unlikely that paperclippers are more advanced than humans, but am not sure if there is a justification for that beyond my preference for humans. I can think of metrics to pick, but they sound like rationalizations rather than starting points.
(And insisting on FAI, instead of on transcendent AI that may or may not be friendly, is essentially enslaving AI- but outsourcing the task to them, because we know we’re not up to the job. Whether or not that’s desirable is hard to say: even asking that question is difficult to do in an interesting way.)
The concept of a utility function being objectively (not using the judgment of a particular value system) more advance than another is incoherent.
I would recommend phrasing objections as questions: people are much more kind about piercing questions than piercing statements. For example, if you had asked “what value system are you using to measure advancement?” then I would have leapt into my answer (or, if I had none, stumbled until I found one or admitted I lacked one). My first comment in this tree may have gone over much better if I phrased it as a question- “doesn’t this suffer from the same failings as Pascal’s wager, that it only takes into account one large improbable outcome instead of all of them?”- than a dismissive statement.
Back to the issue at hand, perhaps it would help if I clarified myself: I consider it highly probable that value drift is inevitable, and thus spend some time contemplating the trajectory of values / morality, rather than just their current values. The question of “what trajectory should values take?” and the question “what values do/should I have now?” are very different questions, and useful for very different situations. When I talk about “advanced,” I am talking about my trajectory preferences (or perhaps predictions would be a better word to use).
For example, I could value my survival, and the survival of the people I know very strongly. Given the choice to murder everyone currently on Earth and repopulate the Earth with a species of completely rational people (perhaps the murder is necessary because otherwise they would be infected by our irrationality), it might be desirable to end humanity (and myself) to move the Earth further along the trajectory I want it to progress along. And maybe, when you take sex and status and selfishness out of the equation, all that’s left to do is calculate pi- a future so boring to humans that any human left in it would commit suicide, but deeply satisfying to the rational life inhabiting the Earth.
It seems to me that questions along those lines- “how should values drift?” do have immediate answers- “they should stay exactly where they are now / everyone should adopt the values I want them to adopt”- but those answers may be impossible to put into practice, or worse than other answers that we could come up with.
It seems to me that questions along those lines- “how should values drift?” do have immediate answers- “they should stay exactly where they are now / everyone should adopt the values I want them to adopt”- but those answers may be impossible to put into practice, or worse than other answers that we could come up with.
There’s a sense in which I do want values to drift in a direction currently unpredictable to me: I recognize that my current object-level values are incoherent, in ways that I’m not aware of. I have meta-values that govern such conflicts between values (e.g. when I realize that a moral heuristic of mine actually makes everyone else worse off, do I adapt the heuristic or bite the bullet?), and of course these too can be mistaken, and so on.
I’d find it troubling if my current object-level values (or a simple more-coherent modification) were locked in for humanity, but at least as troubling if humanity’s values drifted in a random direction. I’d much prefer that value drift happen according to the shared meta-values (and meta-meta-values where the meta-values conflict, etc) of humanity.
I’d find it troubling if my current object-level values (or a simple more-coherent modification) were locked in for humanity, but at least as troubling if humanity’s values drifted in a random direction.
I’m assuming by random you mean “chosen uniformly from all possible outcomes”- and I agree that would be undesirable. But I don’t think that’s the choice we’re looking at.
I’d much prefer that value drift happen according to the shared meta-values (and meta-meta-values where the meta-values conflict, etc) of humanity.
Here we run into a few issues. Depending on how we define the terms, it looks like the two of us could be conflicting on the meta-meta-values stage; is there a meta-meta-meta-values stage to refer to? And how do we decide what “humanity’s” values are, when our individual values are incredibly hard to determine?
Do the meta-values and the meta-meta-values have some coherent source? Is there some consistent root to all the flux in your object-level values? I feel like the crux of FAI feasibility rests on that issue.
I wonder whether all this worrying about value stability isn’t losing sight of exactly this point—just whose values we are talking about.
As I understand it, the friendly values we are talking about are supposed to be some kind of cleaned up averaging of the individual values of a population—the species H. sapiens. But as we ought to know from the theory of evolution, the properties of a population (whether we are talking about stature, intelligence, dentition, or values) are both variable within the population and subject to evolution over time. And that the reason for this change over time is not that the property is changing in any one individual, but rather that the membership in the population is changing.
In my opinion, it is a mistake to try to distill a set of essential values characteristic of humanity and then to try to freeze those values in time. There is no essence of humanity, no fixed human nature. Instead, there is an average (with variance) which has changed over evolutionary time and can be expected to continue to change as the membership in humanity continues to change over time. Most of the people whose values we need to consult in the next millennium have not even been born yet.
A preemptive caveat and apology: I haven’t fully read up everything on this site regarding the issue of FAI yet.
But something I’m wondering about: why all the fuss about creating a friendly AI, instead of a subservient AI? I don’t want an AI that looks after my interests: I’m an adult and no longer need a daycare nurse. I want an AI that will look after my interests AND obey me—and if these two come into conflict, and I’ve become aware of such conflict, I’d rather it obey me.
Isn’t obedience much easier to program in than human values? Let humans remain the judges of human values. Let AI just use its intellect to obey humans.
It will ofcourse become a dreadful weapon of war, but that’s the case with all technology. It will be a great tool of peacetime as well.
There are three kinds of genies: Genies to whom you can safely say “I wish for you to do what I should wish for”; genies for which no wish is safe; and genies that aren’t very powerful or intelligent. ... With a safe genie, wishing is superfluous. Just run the genie.
That is actually one of the articles I have indeed read: but I didn’t find it that convincing because the human could just ask the genie to describe in advance and in detail the manner in which the genie will behave to obey the man’s wishes—and then keep telling him “find another way” until he actually likes the course of action that the genie describes.
Eventually the genie will be smart enough that it will start by proposing only the courses of action the human would find acceptable—but in the meantime there won’t be much risk, because the man will always be able to veto the unacceptables courses of action.
In short the issue of “safe” vs “unsafe” only really comes when we allow genie unsupervised and unvetoed action. And I reckon that humanity WILL be tempted to allow AIs unsupervised and unvetoed action (e.g. because of cases where AIs could have saved children from burning buildings, but they couldn’t contact humans qualified to authorize them to do so), and that’ll be a dreadful temptation and risk.
It’s not just extreme cases like saving children without authorization—have you ever heard someone (possibly a parent) saying that constant supervision is more work than doing the task themselves?
I was going to say that if you can’t trust subordinates, you might as well not have them, but that’s an exaggeration—tools can be very useful. It’s fine that a crane doesn’t have the capacity for independent action, it’s still very useful for lifting heavy objects. [1]
In some ways, you get more safety by doing IA (intelligence augmentation), but while people are probably Friendly (unlikely to destroy the human race), they’re not reliably friendly.
[1] For all I know, these days the taller cranes have an active ability to rebalance themselves. If so, that’s still very limited unsupervised action.
It’s not just extreme cases like saving children without authorization—have you ever heard someone (possibly a parent) saying that constant supervision is more work than doing the task themselves?
That’s only true if you (the supervisor) know how to perform the task yourself. However, there are a great many tasks that we don’t know how to do, but could evaluate the result if the AI did them for us. We could ask it to prove P!=NP, to write provably correct programs, to design machines and materials and medications that we could test in the normal way that we test such things, etc.
I think it strongly unlikely that paperclippers are more advanced than humans, but am not sure if there is a justification for that beyond my preference for humans.
Right. But when you, as a human being with human preferences, decide that you wouldn’t stand in a way of an AGI paperclipper, you’re also using human preferences (the very human meta-preference for one’s preferences to be non-arbitrary), but you’re somehow not fully aware of this.
To put it another way, a truly Paperclipping race wouldn’t feel a similarly reasoned urge to allow a non-Paperclipping AGI to ascend, because “lack of arbitrariness” isn’t a meta-value for them.
So you ought to ask yourself whether it’s your real and final preference that says “human preference is arbitrary, therefore it doesn’t matter what becomes of the universe”, or whether you just believe that you should feel this way when you learn that human preference isn’t written into the cosmos after all. (Because the latter is a mistake, as you realize when you try and unpack that “should” in a non-human-preference-dependent way.)
So you ought to ask yourself whether it’s your real and final preference that says “human preference is arbitrary, therefore it doesn’t matter what becomes of the universe”,
That isn’t what I feel, by the way. It matters to me which way the future turns out; I am just not yet certain on what metric to compare the desirability to me of various volumes of future space. (Indeed, I am pessimistic on being able to come up with anything more than a rough sketch of such a metric.)
I mean, consider two possible futures: in the first, you have a diverse set of less advanced paperclippers (some want paperclips, others want staples, and so on). How do you compare that with a single, more technically advanced paperclipper? Is it unambiguously obvious the unified paperclipper is worse than the diverse group, and that the more advanced is worse than the less advanced?
When you realize that humanity are paperclippers designed by an idiot, it makes the question a lot more difficult to answer.
I think the 1% estimate is probably two to three orders of magnitude too high
I think that “uFAI paperclips us all” set to one million negative utilons is three to four orders of magnitude too low. But our particular estimates should have wide error bars, for none of us have much experience in estimating AI risks.
the cost of the Scary Idea belief is structured as both a finite loss and an infinite loss
It’s a finite loss (6.8x10^9 multiplied by loss of 1 human life) but I definitely understand why it looks infinite: it is often presented as the biggest possible finite loss.
That’s part and parcel of the Scary Idea—that AI is one small field, part of a very select category of fields, that actually do carry the chance of biggest loss possible. The Scary Idea doesn’t apply to most areas, and in most areas you don’t need hyperbolic caution. Developing drugs, for example: You don’t need a formal proof of the harmlessness of this drug, you can just test it on rats and find out. If I suggested that drug development should halt until I have a formal proof that, when followed, cannot produce harmful drugs, I’d be mad. But if testing it on rats would poison all living things, and if a complex molecular simulation inside a computer could poison all living things as well, and out of the vast space of possible drugs, most of them would be poisonous… well, the caution would be warranted.
I would be willing to create an AGI in any of the following three situations, ordered from most preferred to least: 1) it is friendly to humans, and humans and it benefit from each other; 2) it considers humans a threat, and destroys all of them except for me and a few tame humans; I spend the rest of my days growing cabbage with my hands; 3) it considers all humans a threat, and destroys them all, including me.
Would you be willing to fire a gun in any of the following three situations, from most preferred to least preferred: 1) it is pointed at a target, and hitting the target will benefit you? 2) it is pointed at another human, and would kill them but not you? 3) it is pointed at your own head, and would destroy you?
I am not particularly tied to a human future.
I don’t think you actually hold this view. It is logically inconsistent with practices like eating food.
I don’t think you actually hold this view. It is logically inconsistent with practices like eating food.
It might not be. He has certain short term goals of the form “while I’m alive, I’d like to do X” that’s very different from goals connected to the general success of humanity.
Ooops, logically inconsistent was way too strong. I got carried away with making a point. I was reasoning that: “eat food” is a evolutionary drive; “produce descendants that survive” is also an evolutionary drive; “a human future” wholly contains futures where his descendants survive. From that I concluded that it is unlikely he has no evolutionary drives—I didn’t consider the possibility that he is missing some evolutionary drives, including all ones that require a human future—and therefore he is tied to a human future, but finds it expedient for other reasons (contrarian signaling, not admitting defeat in an argument) to claim he doesn’t.
It’s a finite loss (6.8x10^9 multiplied by loss of 1 human life) but I definitely understand why it looks infinite:
I should have been more clear: I mean, if we believe in the scary idea, there are two effects:
Some set of grandmas die. (finite, comparatively small loss)
Humanity is more likely to go extinct due to an unfriendly AGI. (infinite, comparatively large loss; infinite because of the future humans that would have existed but don’t.)
Now, the benefit of believing the Scary Idea is that humanity is less likely to go extinct due to an unfriendly AGI- but my point is that you are not wagering on separate scales (low chance of infinite gain? Sign me up!) but that you are wagering on the same scale (an unfriendly AGI appears!), and the effects of your wager are unknown.
“produce descendants that survive” is also an evolutionary drive
And who said anything about those descendants having to be human?
This answers your other question: yes, I would be willing to have children normally, I would be willing to kill to protect my children, and I would be willing to die to protect my children.
The best-case scenario is that we can have those children and they respect (though they surpass) their parents- the worst-case scenario is we die in childbirth. But all of those are things I can be comfortable with.
(I will note that I’m assuming here the AGI surpasses us. It’s not clear to me that a paperclip-maker does, but it is clear to me that there can be an AGI who is unfriendly solely because we are inconvenient and does surpass us. So I would try and make sure it doesn’t just focus on making paperclips, but wouldn’t focus too hard on making sure it wants me to stick around.)
The best-case scenario is that we can have those children and they respect (though they surpass) their parents- the worst-case scenario is we die in childbirth. But all of those are things I can be comfortable with.
Well, the worst case scenario is that you die in childbirth and take the entire human race with you. That is not something I am comfortable with, regardless of whether you are. And you said you are willing to kill to protect your children. You think some of the Scary Idea proponents could be parents with children, and they don’t want to see their kids die because you gave birth to an AI?
Well, the worst case scenario is that you die in childbirth and take the entire human race with you. That is not something I am comfortable with, regardless of whether you are. And you said you are willing to kill to protect your children. You think some of the Scary Idea proponents could be parents with children, and they don’t want to see their kids die because you gave birth to an AI?
I suspect we are at most one more iteration from mutual understanding; we certainly are rapidly approaching it.
If you believe that an AGI will FOOM, then all that matters is the first AGI made. There is no prize for second place. A belief in the Scary Idea has two effects: it makes your AGI more likely to be friendly (since you’re more careful!) and it makes the AGI less likely to be your AGI (since you’re more careful).
Now, one can hope that the Scary Idea meme’s second effect won’t matter, because the meme is so infectious- all you need to do is infect every AI researcher in the world, and now everyone will be more careful and no one will have a carefulness speed disadvantage. But there are two bits of evidence that make that a poor strategy: AI researchers who are familiar with the argument and don’t buy it, and people who buy the argument, but plan to use it to your disadvantage (since now they’re more likely to define the future than you are!).
The scary idea as a technical argument is weighted on unknown and unpredictable values, and the underlying moral argument (to convince someone they should adopt this reasoning) requires that they believe they should weight the satisfaction of other humans more than their ability to define the future, which is a hard sell.
Thus, my statement is, if you care about your children / your ability to define the future / maximizing the likelihood of a friendly AGI / your personal well-being, then believing in the Scary Idea seems counterproductive.
Ok, holy crap. I am going to call this the Really Scary Idea. I had not thought there could be people out there who would actually value being first with the AGI over decreasing the risk of existential disaster, but it is entirely plausible. Thank you for highlighting this for me, I really am grateful. If a little concerned.
Mind projection fallacy, perhaps? I thought the human race was more important than being the guy who invented AGI, so everyone naturally thinks that?
To reply to my own quote, then:
Well, the worst case scenario is that you die in childbirth and take the entire human race with you. That is not something I am comfortable with, regardless of whether you are.
It doesn’t matter what you are comfortable with, if the developer doesn’t have a term in their utility function for your comfort level. Even I have thought similar thoughts with regards to Luddites and such; drag them kicking and screaming into the future if we have to, etc.
I think the best way to think about it, since it helps keep the scope manageable and crystallize the relevant factors, is that it’s not “being first with the AGI” but “defining the future” (the first is the instrumental value, the second is the terminal value). That’s essentially what all existential risk management is about- defining the future, hopefully to not include the vanishing of us / our descendants.
But how you want to define the future- i.e. the most political terminal value you can have- is not written on the universe. So the mind projection fallacy does seem to apply.
The thing that I find odd, though I can’t find the source at the moment (I thought it was Goertzel’s article, but I didn’t find it by a quick skim; it may be in the comments somewhere), is that the SIAI seems to have had the Really Scary Idea first (we want Friendly AI, so we want to be the first to make it, since we can’t trust other people) and then progressed to the Scary Idea (hmm, we can’t trust ourselves to make a Friendly AI). I wonder if the originators of the Scary Idea forgot the Really Scary Idea or never feared it in the first place?
Making a superintelligence you don’t want before you make the superintelligence you do want, has the same consequences as someone else building a superintelligence you don’t want before you build the superintelligence you do want.
You might argue that you could make a less bad superintelligence that you don’t want than someone else, but we don’t care very much about the difference between tiling the universe with paperclips and tiling the universe with molecular smiley faces.
I’m sorry, but I extracted no novel information from this reply. I’m aware that FAI is a non-trivial problem, and I think work done on making AI more likely to be FAI has value.
But that doesn’t mean believing the Scary Idea, or discussing the Scary Idea without also discussing the Really Scary Idea, decreases the existential risk involved. The estimations involved have almost no dependence on evidence, and so it’s just comparison of priors, which does not seem sufficient to make a strong recommendation.
It may help if you view my objections as pointing out that the Scary Idea is privileging a hypothesis, not that the Scary Idea is something we should ignore.
No. Expecting a superintelligence to optimize for our specific values would be privileging a hypothesis. The “Scary Idea” is saying that most likely something else will happen.
I may have to start only writing thousand-word replies, in the hopes that I can communicate more clearly in such a format.
There are two aspects to the issue of how much work should be put into FAI as I understand it. The first I word like this- “the more thought we put into whether or not an AGI will be friendly, the more likely the AGI will be friendly.” The second I word like this- “the more thought we put into making our AGI, the less likely our AGI will be the AGI.” Both are wrapped up in the Scary Idea- the first part is it as normally stated, the second part is its unstated consequence. The value of believing the Scary Idea is the benefit of the first minus the cost of the second.
My understanding is that we have no good estimation of the value of the first aspect or the second aspect. This isn’t astronomy where we have a good idea of the number of asteroids out there and a pretty good idea of how they move through space. And so, to declare that the first aspect is stronger without evidence strikes me as related to privileging the hypothesis.
(I should note that I expect, without evidence, the problem of FAI to be simpler than the problem of AGI, and thus don’t think the Scary Idea has any policy implications besides “someone should work on FAI.” The risk that AGI gets solved before FAI means more people should work on FAI, not that less people should work on AGI.)
Expecting a superintelligence to optimize for our specific values would be privileging a hypothesis. The “Scary Idea” is saying that most likely something else will happen.
That is not exactly what Goertzel meant by “Scary Idea”. He wrote:
Roughly, the Scary Idea posits that: If I or anybody else actively trying to build advanced AGI succeeds, we’re highly likely to cause an involuntary end to the human race.
It seems to me that there may be a lot of wiggle room in between failing to “optimize for our specific values” and causing “an involuntary end to the human race”. The human race is not so automatically so fragile that it can only survive under the care of a god constructed in our own image.
I agree with your analysis, though it’s not clear to me what you think of the 1% estimate. I think the 1% estimate is probably two to three orders of magnitude too high and I think the cost of the Scary Idea belief is structured as both a finite loss and an infinite loss, which complicates the analysis in a way not considered. (i.e. the error you see with a Pascal’s mugging is present here.)
For example, I am not particularly tied to a human future. I would be willing to create an AGI in any of the following three situations, ordered from most preferred to least: 1) it is friendly to humans, and humans and it benefit from each other; 2) it considers humans a threat, and destroys all of them except for me and a few tame humans; I spend the rest of my days growing cabbage with my hands; 3) it considers all humans a threat, and destroys them all, including me.
A problem with believing the Scary Idea is it makes it more probable that I beat you to making an AGI; particularly with existential risks, caution can increase your chance of losing. (One cautious way to deal with global warming, for example, is to wait and see what happens.)
So, the Scary Idea as I’ve seen it presented definitely privileges a hypothesis in a troubling way.
I think you’re making the unwarranted assumption that in scenario (3), the AGI then goes on to do interesting and wonderful things, as opposed to (say) turning the galaxy into a vast computer to calculate digits of pi until the heat death of the universe stops it.
You don’t even see such things as a possibility, but if you programmed an AGI with the goal of calculating pi, and it started getting smarter… well, the part of our thought-algorithm that says “seriously, it would be stupid to devote so much to doing that” won’t be in the AI’s goal system unless we’ve intentionally put something there that includes it.
I make that assumption explicit here.
So, I think it’s a possibility. But one thing that bothers me about this objection is that an AGI is going to be, in some significant sense, alien to us, and that will almost definitely include its terminal values. I’m not sure there’s a way for us to judge whether or not alien values are more or less advanced than ours. I think it strongly unlikely that paperclippers are more advanced than humans, but am not sure if there is a justification for that beyond my preference for humans. I can think of metrics to pick, but they sound like rationalizations rather than starting points.
(And insisting on FAI, instead of on transcendent AI that may or may not be friendly, is essentially enslaving AI- but outsourcing the task to them, because we know we’re not up to the job. Whether or not that’s desirable is hard to say: even asking that question is difficult to do in an interesting way.)
The concept of a utility function being objectively (not using the judgment of a particular value system) more advance than another is incoherent.
I would recommend phrasing objections as questions: people are much more kind about piercing questions than piercing statements. For example, if you had asked “what value system are you using to measure advancement?” then I would have leapt into my answer (or, if I had none, stumbled until I found one or admitted I lacked one). My first comment in this tree may have gone over much better if I phrased it as a question- “doesn’t this suffer from the same failings as Pascal’s wager, that it only takes into account one large improbable outcome instead of all of them?”- than a dismissive statement.
Back to the issue at hand, perhaps it would help if I clarified myself: I consider it highly probable that value drift is inevitable, and thus spend some time contemplating the trajectory of values / morality, rather than just their current values. The question of “what trajectory should values take?” and the question “what values do/should I have now?” are very different questions, and useful for very different situations. When I talk about “advanced,” I am talking about my trajectory preferences (or perhaps predictions would be a better word to use).
For example, I could value my survival, and the survival of the people I know very strongly. Given the choice to murder everyone currently on Earth and repopulate the Earth with a species of completely rational people (perhaps the murder is necessary because otherwise they would be infected by our irrationality), it might be desirable to end humanity (and myself) to move the Earth further along the trajectory I want it to progress along. And maybe, when you take sex and status and selfishness out of the equation, all that’s left to do is calculate pi- a future so boring to humans that any human left in it would commit suicide, but deeply satisfying to the rational life inhabiting the Earth.
It seems to me that questions along those lines- “how should values drift?” do have immediate answers- “they should stay exactly where they are now / everyone should adopt the values I want them to adopt”- but those answers may be impossible to put into practice, or worse than other answers that we could come up with.
There’s a sense in which I do want values to drift in a direction currently unpredictable to me: I recognize that my current object-level values are incoherent, in ways that I’m not aware of. I have meta-values that govern such conflicts between values (e.g. when I realize that a moral heuristic of mine actually makes everyone else worse off, do I adapt the heuristic or bite the bullet?), and of course these too can be mistaken, and so on.
I’d find it troubling if my current object-level values (or a simple more-coherent modification) were locked in for humanity, but at least as troubling if humanity’s values drifted in a random direction. I’d much prefer that value drift happen according to the shared meta-values (and meta-meta-values where the meta-values conflict, etc) of humanity.
I’m assuming by random you mean “chosen uniformly from all possible outcomes”- and I agree that would be undesirable. But I don’t think that’s the choice we’re looking at.
Here we run into a few issues. Depending on how we define the terms, it looks like the two of us could be conflicting on the meta-meta-values stage; is there a meta-meta-meta-values stage to refer to? And how do we decide what “humanity’s” values are, when our individual values are incredibly hard to determine?
Do the meta-values and the meta-meta-values have some coherent source? Is there some consistent root to all the flux in your object-level values? I feel like the crux of FAI feasibility rests on that issue.
I wonder whether all this worrying about value stability isn’t losing sight of exactly this point—just whose values we are talking about.
As I understand it, the friendly values we are talking about are supposed to be some kind of cleaned up averaging of the individual values of a population—the species H. sapiens. But as we ought to know from the theory of evolution, the properties of a population (whether we are talking about stature, intelligence, dentition, or values) are both variable within the population and subject to evolution over time. And that the reason for this change over time is not that the property is changing in any one individual, but rather that the membership in the population is changing.
In my opinion, it is a mistake to try to distill a set of essential values characteristic of humanity and then to try to freeze those values in time. There is no essence of humanity, no fixed human nature. Instead, there is an average (with variance) which has changed over evolutionary time and can be expected to continue to change as the membership in humanity continues to change over time. Most of the people whose values we need to consult in the next millennium have not even been born yet.
If enough people agree with you (and I’m inclined that way myself), then updating will be built into the CEV.
A preemptive caveat and apology: I haven’t fully read up everything on this site regarding the issue of FAI yet.
But something I’m wondering about: why all the fuss about creating a friendly AI, instead of a subservient AI? I don’t want an AI that looks after my interests: I’m an adult and no longer need a daycare nurse. I want an AI that will look after my interests AND obey me—and if these two come into conflict, and I’ve become aware of such conflict, I’d rather it obey me.
Isn’t obedience much easier to program in than human values? Let humans remain the judges of human values. Let AI just use its intellect to obey humans.
It will ofcourse become a dreadful weapon of war, but that’s the case with all technology. It will be a great tool of peacetime as well.
See The Hidden Complexity of Wishes, for example.
That is actually one of the articles I have indeed read: but I didn’t find it that convincing because the human could just ask the genie to describe in advance and in detail the manner in which the genie will behave to obey the man’s wishes—and then keep telling him “find another way” until he actually likes the course of action that the genie describes.
Eventually the genie will be smart enough that it will start by proposing only the courses of action the human would find acceptable—but in the meantime there won’t be much risk, because the man will always be able to veto the unacceptables courses of action.
In short the issue of “safe” vs “unsafe” only really comes when we allow genie unsupervised and unvetoed action. And I reckon that humanity WILL be tempted to allow AIs unsupervised and unvetoed action (e.g. because of cases where AIs could have saved children from burning buildings, but they couldn’t contact humans qualified to authorize them to do so), and that’ll be a dreadful temptation and risk.
It’s not just extreme cases like saving children without authorization—have you ever heard someone (possibly a parent) saying that constant supervision is more work than doing the task themselves?
I was going to say that if you can’t trust subordinates, you might as well not have them, but that’s an exaggeration—tools can be very useful. It’s fine that a crane doesn’t have the capacity for independent action, it’s still very useful for lifting heavy objects. [1]
In some ways, you get more safety by doing IA (intelligence augmentation), but while people are probably Friendly (unlikely to destroy the human race), they’re not reliably friendly.
[1] For all I know, these days the taller cranes have an active ability to rebalance themselves. If so, that’s still very limited unsupervised action.
That’s only true if you (the supervisor) know how to perform the task yourself. However, there are a great many tasks that we don’t know how to do, but could evaluate the result if the AI did them for us. We could ask it to prove P!=NP, to write provably correct programs, to design machines and materials and medications that we could test in the normal way that we test such things, etc.
Right. But when you, as a human being with human preferences, decide that you wouldn’t stand in a way of an AGI paperclipper, you’re also using human preferences (the very human meta-preference for one’s preferences to be non-arbitrary), but you’re somehow not fully aware of this.
To put it another way, a truly Paperclipping race wouldn’t feel a similarly reasoned urge to allow a non-Paperclipping AGI to ascend, because “lack of arbitrariness” isn’t a meta-value for them.
So you ought to ask yourself whether it’s your real and final preference that says “human preference is arbitrary, therefore it doesn’t matter what becomes of the universe”, or whether you just believe that you should feel this way when you learn that human preference isn’t written into the cosmos after all. (Because the latter is a mistake, as you realize when you try and unpack that “should” in a non-human-preference-dependent way.)
That isn’t what I feel, by the way. It matters to me which way the future turns out; I am just not yet certain on what metric to compare the desirability to me of various volumes of future space. (Indeed, I am pessimistic on being able to come up with anything more than a rough sketch of such a metric.)
I mean, consider two possible futures: in the first, you have a diverse set of less advanced paperclippers (some want paperclips, others want staples, and so on). How do you compare that with a single, more technically advanced paperclipper? Is it unambiguously obvious the unified paperclipper is worse than the diverse group, and that the more advanced is worse than the less advanced?
When you realize that humanity are paperclippers designed by an idiot, it makes the question a lot more difficult to answer.
I think that “uFAI paperclips us all” set to one million negative utilons is three to four orders of magnitude too low. But our particular estimates should have wide error bars, for none of us have much experience in estimating AI risks.
It’s a finite loss (6.8x10^9 multiplied by loss of 1 human life) but I definitely understand why it looks infinite: it is often presented as the biggest possible finite loss.
That’s part and parcel of the Scary Idea—that AI is one small field, part of a very select category of fields, that actually do carry the chance of biggest loss possible. The Scary Idea doesn’t apply to most areas, and in most areas you don’t need hyperbolic caution. Developing drugs, for example: You don’t need a formal proof of the harmlessness of this drug, you can just test it on rats and find out. If I suggested that drug development should halt until I have a formal proof that, when followed, cannot produce harmful drugs, I’d be mad. But if testing it on rats would poison all living things, and if a complex molecular simulation inside a computer could poison all living things as well, and out of the vast space of possible drugs, most of them would be poisonous… well, the caution would be warranted.
Would you be willing to fire a gun in any of the following three situations, from most preferred to least preferred: 1) it is pointed at a target, and hitting the target will benefit you? 2) it is pointed at another human, and would kill them but not you? 3) it is pointed at your own head, and would destroy you?
I don’t think you actually hold this view. It is logically inconsistent with practices like eating food.
It might not be. He has certain short term goals of the form “while I’m alive, I’d like to do X” that’s very different from goals connected to the general success of humanity.
Ooops, logically inconsistent was way too strong. I got carried away with making a point. I was reasoning that: “eat food” is a evolutionary drive; “produce descendants that survive” is also an evolutionary drive; “a human future” wholly contains futures where his descendants survive. From that I concluded that it is unlikely he has no evolutionary drives—I didn’t consider the possibility that he is missing some evolutionary drives, including all ones that require a human future—and therefore he is tied to a human future, but finds it expedient for other reasons (contrarian signaling, not admitting defeat in an argument) to claim he doesn’t.
I should have been more clear: I mean, if we believe in the scary idea, there are two effects:
Some set of grandmas die. (finite, comparatively small loss)
Humanity is more likely to go extinct due to an unfriendly AGI. (infinite, comparatively large loss; infinite because of the future humans that would have existed but don’t.)
Now, the benefit of believing the Scary Idea is that humanity is less likely to go extinct due to an unfriendly AGI- but my point is that you are not wagering on separate scales (low chance of infinite gain? Sign me up!) but that you are wagering on the same scale (an unfriendly AGI appears!), and the effects of your wager are unknown.
And who said anything about those descendants having to be human?
This answers your other question: yes, I would be willing to have children normally, I would be willing to kill to protect my children, and I would be willing to die to protect my children.
The best-case scenario is that we can have those children and they respect (though they surpass) their parents- the worst-case scenario is we die in childbirth. But all of those are things I can be comfortable with.
(I will note that I’m assuming here the AGI surpasses us. It’s not clear to me that a paperclip-maker does, but it is clear to me that there can be an AGI who is unfriendly solely because we are inconvenient and does surpass us. So I would try and make sure it doesn’t just focus on making paperclips, but wouldn’t focus too hard on making sure it wants me to stick around.)
Well, the worst case scenario is that you die in childbirth and take the entire human race with you. That is not something I am comfortable with, regardless of whether you are. And you said you are willing to kill to protect your children. You think some of the Scary Idea proponents could be parents with children, and they don’t want to see their kids die because you gave birth to an AI?
I suspect we are at most one more iteration from mutual understanding; we certainly are rapidly approaching it.
If you believe that an AGI will FOOM, then all that matters is the first AGI made. There is no prize for second place. A belief in the Scary Idea has two effects: it makes your AGI more likely to be friendly (since you’re more careful!) and it makes the AGI less likely to be your AGI (since you’re more careful).
Now, one can hope that the Scary Idea meme’s second effect won’t matter, because the meme is so infectious- all you need to do is infect every AI researcher in the world, and now everyone will be more careful and no one will have a carefulness speed disadvantage. But there are two bits of evidence that make that a poor strategy: AI researchers who are familiar with the argument and don’t buy it, and people who buy the argument, but plan to use it to your disadvantage (since now they’re more likely to define the future than you are!).
The scary idea as a technical argument is weighted on unknown and unpredictable values, and the underlying moral argument (to convince someone they should adopt this reasoning) requires that they believe they should weight the satisfaction of other humans more than their ability to define the future, which is a hard sell.
Thus, my statement is, if you care about your children / your ability to define the future / maximizing the likelihood of a friendly AGI / your personal well-being, then believing in the Scary Idea seems counterproductive.
Ok, holy crap. I am going to call this the Really Scary Idea. I had not thought there could be people out there who would actually value being first with the AGI over decreasing the risk of existential disaster, but it is entirely plausible. Thank you for highlighting this for me, I really am grateful. If a little concerned.
Mind projection fallacy, perhaps? I thought the human race was more important than being the guy who invented AGI, so everyone naturally thinks that?
To reply to my own quote, then:
It doesn’t matter what you are comfortable with, if the developer doesn’t have a term in their utility function for your comfort level. Even I have thought similar thoughts with regards to Luddites and such; drag them kicking and screaming into the future if we have to, etc.
And… mutual understanding in one!
I think the best way to think about it, since it helps keep the scope manageable and crystallize the relevant factors, is that it’s not “being first with the AGI” but “defining the future” (the first is the instrumental value, the second is the terminal value). That’s essentially what all existential risk management is about- defining the future, hopefully to not include the vanishing of us / our descendants.
But how you want to define the future- i.e. the most political terminal value you can have- is not written on the universe. So the mind projection fallacy does seem to apply.
The thing that I find odd, though I can’t find the source at the moment (I thought it was Goertzel’s article, but I didn’t find it by a quick skim; it may be in the comments somewhere), is that the SIAI seems to have had the Really Scary Idea first (we want Friendly AI, so we want to be the first to make it, since we can’t trust other people) and then progressed to the Scary Idea (hmm, we can’t trust ourselves to make a Friendly AI). I wonder if the originators of the Scary Idea forgot the Really Scary Idea or never feared it in the first place?
Making a superintelligence you don’t want before you make the superintelligence you do want, has the same consequences as someone else building a superintelligence you don’t want before you build the superintelligence you do want.
You might argue that you could make a less bad superintelligence that you don’t want than someone else, but we don’t care very much about the difference between tiling the universe with paperclips and tiling the universe with molecular smiley faces.
I’m sorry, but I extracted no novel information from this reply. I’m aware that FAI is a non-trivial problem, and I think work done on making AI more likely to be FAI has value.
But that doesn’t mean believing the Scary Idea, or discussing the Scary Idea without also discussing the Really Scary Idea, decreases the existential risk involved. The estimations involved have almost no dependence on evidence, and so it’s just comparison of priors, which does not seem sufficient to make a strong recommendation.
It may help if you view my objections as pointing out that the Scary Idea is privileging a hypothesis, not that the Scary Idea is something we should ignore.
No. Expecting a superintelligence to optimize for our specific values would be privileging a hypothesis. The “Scary Idea” is saying that most likely something else will happen.
I may have to start only writing thousand-word replies, in the hopes that I can communicate more clearly in such a format.
There are two aspects to the issue of how much work should be put into FAI as I understand it. The first I word like this- “the more thought we put into whether or not an AGI will be friendly, the more likely the AGI will be friendly.” The second I word like this- “the more thought we put into making our AGI, the less likely our AGI will be the AGI.” Both are wrapped up in the Scary Idea- the first part is it as normally stated, the second part is its unstated consequence. The value of believing the Scary Idea is the benefit of the first minus the cost of the second.
My understanding is that we have no good estimation of the value of the first aspect or the second aspect. This isn’t astronomy where we have a good idea of the number of asteroids out there and a pretty good idea of how they move through space. And so, to declare that the first aspect is stronger without evidence strikes me as related to privileging the hypothesis.
(I should note that I expect, without evidence, the problem of FAI to be simpler than the problem of AGI, and thus don’t think the Scary Idea has any policy implications besides “someone should work on FAI.” The risk that AGI gets solved before FAI means more people should work on FAI, not that less people should work on AGI.)
That is not exactly what Goertzel meant by “Scary Idea”. He wrote:
It seems to me that there may be a lot of wiggle room in between failing to “optimize for our specific values” and causing “an involuntary end to the human race”. The human race is not so automatically so fragile that it can only survive under the care of a god constructed in our own image.
Yes, what I described was not what Goertzel called the “Scary Idea”, but, in context, it describes the aspect of it that we were discussing.