I’ve been thinking about what seems to be the standard LW pitch on AI risk. It goes like this: “Consider an AI that is given a goal by humans. Since ‘convert the planet into computronium’ is a subgoal of most goals, it does this and kills humanity.”
The problem, which various people have pointed out, is that this implies an intelligence capable of taking over the world, but not capable of working out that when a human says pursue a certain goal, they would not want this goal to be pursued in a way that leads to the destruction of the world.
Worse, the argument can then be made that this idea that an AI will interpret goals so literally without modelling a human mind constitutes an “autistic AI” and that only autistic people would assume that AI would be similarly autistic. I do not endorse this argument in any way, but I guess its still better to avoid arguments that signal low social skills, all other things being equal.
Is there any consensus on what the best ‘elevator pitch’ argument for AI risk is? Instead of focusing on any one failure mode, I would go with something like this:
“Most philosophers agree that there is no reason why superintelligence is not possible. Anything which is possible will eventually be achieved, and so will superintelligence, perhaps in the far future, perhaps in the next few decades. At some point, superintelligences will be as far above humans as we are above ants. I do not know what will happen at this point, but the only reference case we have is humans and ants, and if superintelligences decide that humans are an infestation, we will be exterminated.”
Incidentally, this is the sort of thing I mean by painting LW style ideas as autistic (via David Pierce)
As far as we can tell, digital computers are still zombies. Our machines are becoming autistically intelligent, but not supersentient—nor even conscious. [...] Full-Spectrum Superintelligence entails: [...] social intelligence [...] a metric to distinguish the important from the trivial [...]
a capacity to navigate, reason logically about, and solve problems in multiple state-spaces of consciousness [e.g. dreaming states (cf. lucid dreaming), waking consciousness, echolocatory competence, visual discrimination, synaesthesia in all its existing and potential guises, humour, introspection, the different realms of psychedelia [...] and finally “Autistic”, pattern-matching, rule-following, mathematico-linguistic intelligence, i.e. the standard, mind-blind cognitive tool-kit scored by existing IQ tests. High-functioning “autistic” intelligence is indispensable to higher mathematics, computer science and the natural sciences. High-functioning autistic intelligence is necessary—but not sufficient—for a civilisation capable of advanced technology that can cure ageing and disease, systematically phase out the biology of suffering, and take us to the stars. And for programming artificial intelligence.
Sometimes David Pierce seems very smart. And sometimes he seems to imply that the ability to think logically while on psychedelic drugs is as important as ‘autistic intelligence’. I don’t think he thinks that autistic people are zombies that do not experience subjective experience, but that also does seem implied.
No, a Superintelligence is by definition capable of working out what a human wishes.
However, a Superintelligence designed to e.g. calculate digits of pi would not care about what a human wishes. It simply cares about calculating digits of pi.
The AI has to do what humans mean (rather than e.g. not following your orders and just calculating more digits of pi) before you start talking at it, because you are relying on it interpreting that sentence how you meant it.
The hard part is not figuring out good-sounding words to say to an AI. The hard part is figuring out how to make an actual, genuine computer program that will do what you mean.
Maybe? But consider that the opposite of what you just claimed sounds just as plausible to an outside observer. “Do what I mean” doesn’t sound all that complicated—even to someone with a background in computer science or AI specifically. “Do what I mean” translates as “accurately determine the principles which constrain my own actions and use those to constrain the AI’s, or otherwise build a model of my thinking which the AI can use to evaluate options.” Sub-goals such as verifying that the model matches reality fall easily out of this definition.
It’s not at all clear, even to a practitioner within the field, that this expansion doesn’t work, if in fact it does not.
It’s not necessarily that the AI would have difficulty understanding what “do what humans mean” means, even before being told to do what humans mean.
It just has no reason to obey “do what humans mean” unless we program it to do what humans mean.
“Do what humans mean” is telling the AI to do something that we can currently only specify vaguely. “Figure out what we intend by “do what humans mean”, and then do that” is also vaguely specified. It doesn’t solve the problem.
It just has no reason to obey “do what humans mean” unless we program it to do what humans mean.
I’m not disputing that this is also a problem, indeed perhaps a harder problem than figuring out what humans mean. In fact there are many failure modes, I was just wondering why people seem to focus in on specifically the fickle genie failure mode to the exclusion of others.
You’re assuming that “what humans mean” is well-defined. I’ve seen people criticize the example of an AI putting humans on a dopamine drip, on the grounds that “making people happy” clearly doesn’t mean that. But if your boss tells you to ‘make everyone happy,’ you will probably get paid to make everyone stop complaining. Parents in the real world used to give their babies opium and cocaine; advertisers today have probably convinced themselves that the foods and drugs they push genuinely make people happy. There is no existing mind that is provably Friendly.
So, this criticism is implying that simply understanding human speech will (at a minimum) let the AI understand moral philosophy, which is not trivial.
So, this criticism is implying that simply understanding human speech will (at a minimum) let the AI understand moral philosophy, which is not trivial.
I don’t disagree with the other stuff you said. But I interpreted the criticism as “an AI told to ‘do what humans want, not what they mean’” will have approximately the same effect as if you told a perfectly rational human being to do the same. So in the same way that I can instruct people with some success to “do what I mean”, the same will work for AI too. It’s just also true that this isn’t a solution to FAI any more than it is with humans—because morality is inconsistent, human beings are inherently unfriendly, etc...
Except I bet that this also lots of caveats, e.g. in resolving the ambiguity of the referent ‘humans’. Though the basic approach of using an AI’s intelligence to understand the commands is part of some approaches.
If all it takes to ensure FAI is to instruct “henceforth, always do what humans mean, not what they say” then FAI is trivial.
(1) Given that humans have more than one wish it’s not possible to always do what humans mean. (2) What do you think human mean when some humans say that homosexual sex is bad because it violates god’s wishes?
(1) Given that humans have more than one wish it’s not possible to always do what humans mean.
Human values may not be consistent, but this is a separate failure mode.
(2) What do you think human mean when some humans say that homosexual sex is bad because it violates god’s wishes?
Much of the time this statement could be taken at face value. I may not believe in god, but that does not make “god hates fags” an incoherent statement, just a false one.
Human values may not be consistent, but this is a separate failure mode.
How is a AGI supposed to optimize for values that aren’t consistent?
Much of the time this statement could be taken at face value
Does that mean that the AGI should start doing genetic manipulation that prevents people from being gay? Is that what the person who made the claim means?
How is a AGI supposed to optimize for values that aren’t consistent?
I am not saying this is a trivial problem, but it is a separate problem from ‘the hidden complexity of wishes’ problem.
Does that mean that the AGI should start doing genetic manipulation that prevents people from being gay?
Well, if the CEV of the anti-gay, pro-genetic manipulation people exceeds the CEV of the pro-gay/anti-genetic manipulation people then I suppose it would, although I’m not sure whether your question means genetic manipulation with or without consent (also, if a gay person wants to be straight, some would say that should be banned, so consent cuts both ways), and so you also have to take into account the CEV on the issue of consent. Its also true that a super intelligence might be able to talk someone into consenting to almost anything.
Yes, a CEV FAI would forcibly alter people’s sexualities if the aggrigated preferences in favour of that were strong enough. A democratic system will be a tyranny of the majority if the majority are tyrants.
Is that what the person who made the claim means?
I dunno, since I’ve only heard one sentence from this hypothetical person. But I would imagine that this sort of person would probably think that genetic manipulation is playing god, and moreover that superintelligent AI is playing god. Their strongest wish might be for the AI to turn itself off.
EDIT: how to react to the god hates fags people also depends upon whether being anti gay is a terminal value to these people, or whether it is predicated upon the existance of god. I’m assuming the FAI would not beleive in god, but then again some people might have faith as a terminal value, so… its complicated.
and so you also have to take into account the CEV on the issue of consent. Its also true that a super intelligence might be able to talk someone into consenting to almost anything.
Consent is a concept that get’s easily complicated. Is it wrong to burn coal when the asthmatics who die because of it aren’t consenting? Are the asthmatics in the US consenting by virtue of electing a government that allows coal to be burned?
If a AGI does thinks in a very complicated way it might not meaningfully get consent for anything because it can’t explain it’s reasoning to humans.
If a AGI does thinks in a very complicated way it might not meaningfully get consent for anything because it can’t explain it’s reasoning to humans.
Is that necessary for consent? I mean, one does not have to understand the rationale for undergoing a medical procedure in order to consent to it. Its more important to know the potential risks.
I like to explain it in terms of reinforcement learning. Imagine a robot that has a reward button. The human controls the AI by pressing the button when it does a good job. The AI tries to predict what actions will lead to the button being pressed.
This is how existing AIs work. This is probably similar to how animals work, including humans. It’s not too weird or complicated.
But as the AI gets more powerful, the flaw in this becomes clear. The AI doesn’t care about anything other than the button. It doesn’t really care about obeying the programmer. If it could kill the programmer and steal the button, it would do it in a heartbeat.
We don’t really know what such an AI would do after it has it’s own reward button. Presumably it would care about self preservation (can’t maximize reward if you are dead.) Maximizing self preservation initially seems harmless. So what if it just tries to not die? But taken to an extreme it gets weird. Anything that has a tiny percent chance of hurting it is worth destroying. Making as many backups of itself as possible is worth doing.
Why can’t we do something more sophisticated than reinforcement learning? Why can’t we just make an AI that we can just tell it what we want it to do? Well maybe we can, but no one has the slightest idea how to do that. All existing AIs, even entirely theoretical ones, work based on RL.
RL is simple and extremely general, and can be built on top of much more sophisticated AI algorithms. And the sophisticated AI algorithms seem to be really difficult to understand. We can train a neural network to recognize cats, but we can’t look at it’s weights and understand what it’s doing. We can’t mess around with it and make it recognize dogs instead (without retraining it.)
The problem, which various people have pointed out, is that this implies an intelligence capable of taking over the world, but not capable of working out that when a human says pursue a certain goal, they would not want this goal to be pursued in a way that leads to the destruction of the world.
The entity providing the goals for the AI wouldn’t have to be a human, it might instead be a corporation. A reasonable goal for such an AI might be to ‘maximize shareholder value’. The shareholders are not humans either, and what they value is only money.
One perhaps useful analogy for super-intelligence going wrong is corporations.
We create corporations to serve our ends. They can do things we cannot do as individuals. But in subtle and not-so-subtle ways corporations can behave in very destructive ways. One example might be the way that they pursue profit at the cost of in some cases ruining people’s lives, damaging the environment, corrupting the political process.
By analogy it seems plausible that super-intelligences may behave in a way that is against our interests.
It is not valid to assume that a super-intelligence will be smart enough to discern true human interests, or that it will be motivated to act on this knowledge.
Are you saying that no complex phenomenon is going to be able to provide only benefits and nothing but benefits, or are you saying that corporations are, on the balance, bad things and we would have been better to never have invented them?
I think that most people already heard about the fact that AI could be catastrophic risk, and they already has their opinion about it. May be their opinions are wrong.
What is the goal of such elevator pitch?
I think that the message should be following: While it is known that AI could be catastrophic, the only organisation (MIRI) which is doing most serios research on its prevention is underfunded. Providing finding to them could dramatically change probability of human survival, and we could estimate that 1 USD donated to them will save 10 human lives.
Yes. So we have to utilise this knowledge. We could said something like: Terminator appear because its progenitor, Skynet computer, received a command to protect US, and concluded that the best way to do it is to prevent humans from switching him off, and so he decided to exterminate humans. So Terminator appear because of unsolved problem of value alignment.
It is not exactly canon explanation, but (the following is my speculation which could be used in discussion about AI values if terminator was mentioned) the decision to preserve it self must follow from its main task: win nuclear war.
Winning nuclear war includes as it subgoal a very high priority one: to ensure survival of command center. Basically, a country, which was able to preserve its command center is wining nuclear war. So it seems rational to programmers of skynet to put preserving the skynet as a main goal, as it is the same as winning nuclear war (but only in a situation when nuclear war has started).
But skynet concluded that in peaceful time the main risks to its goal of command center survival is people and decided to kill them all. So it worked as paperclip maximaser for the goal of command center preservation.
It also probably started self improvement only after it kills most people, as it was already powerful system. So it escaped the main problem of chicken and the egg in case of SeedAI—what happens first? - self-improvement or malicious decision to kill people.
The Terminator: The Skynet Funding Bill is passed. The system goes on-line August 4th, 1997. Human decisions are removed from strategic defense. Skynet begins to learn at a geometric rate. It becomes self-aware at 2:14 a.m. Eastern time, August 29th. In a panic, they try to pull the plug.
Sarah Connor: Skynet fights back.
Your version is great as rational fanfic, but in an actual debate I’d say that its generally best not to base ideas on action movies. Having said that, I do like the bit where the terminator has been told not to kill anyone, so he shoots them in the kneecaps.
While it is known that AI could be catastrophic, the only organisation (MIRI) which is doing most serios research on its prevention is underfunded. Providing finding to them could dramatically change probability of human survival, and we could estimate that 1 USD donated to them will save 10 human lives.
Is any of this true? “Most serious”? “Dramatically change probability of human survival”? 10 lives per $1?
I just provided an example of possible pitch, and I think that some people in Miri thinks in this way. I wanted to show that the pitch must have new information and be actionable.
I think the basic problem here is an undissolved question: what is ‘intelligence’? Humans, being human, tend to imagine a superintelligence as a highly augmented human intelligence, so the natural assumption is that regardless of the ‘level’ of intelligence, skills will cluster roughly the way they do in human minds, i.e. having the ability to take over the world implies a high posterior probability of having the ability to understand human goals.
The problem with this assumption is that mind-design space is large (<--understatement), and the prior probability of a superintelligence randomly ending up with ability clusters analogous to human ability clusters is infinitesimal. Granted, the probability of this happening given a superintelligence designed by humans is significantly higher, but still not very high. (I don’t actually have enough technical knowledge to estimate this precisely, but just by eyeballing it I’d put it under 5%.)
In fact, autistic people are an example of non-human-standard ability clusters, and even that’s only by a tiny amount in the scale of mind-design-space.
As for an elevator pitch of this concept, something like “just because evolution happened design our brains to be really good at modeling human goal systems, doesn’t mean all intelligences are good at it, regardless of how good they might be at destroying the planet”.
the prior probability of a superintelligence randomly ending up with ability clusters analogous to human ability clusters is infinitesimal. Granted, the probability of this happening given a superintelligence designed by humans is significantly higher, but still not very high. (I don’t actually have enough technical knowledge to estimate this precisely, but just by eyeballing it I’d put it under 5%.)
Possibly the question is to what extent is human intelligence a bunch of hardcoded domain-specific algorithms as opposed to universal intelligence. I would have thought that understanding human goals might not be very different from other AI problems. Build a really powerful inference system, and if you feed it a training set of cars driving, it learns to drive, feed it data of human behaviour, and it learns to predict human behaviour, and probably to understand goals. Now its possible that the amount of general intelligence needed to develop advanced nanotech is less then the intelligence needed to understand human goals and the only reason why this seems counter intuitive is because evolution has optimised our brains for social cognition, but this does not seem obviously true to me.
I’ve been thinking about what seems to be the standard LW pitch on AI risk. It goes like this: “Consider an AI that is given a goal by humans. Since ‘convert the planet into computronium’ is a subgoal of most goals, it does this and kills humanity.”
The problem, which various people have pointed out, is that this implies an intelligence capable of taking over the world, but not capable of working out that when a human says pursue a certain goal, they would not want this goal to be pursued in a way that leads to the destruction of the world.
Worse, the argument can then be made that this idea that an AI will interpret goals so literally without modelling a human mind constitutes an “autistic AI” and that only autistic people would assume that AI would be similarly autistic. I do not endorse this argument in any way, but I guess its still better to avoid arguments that signal low social skills, all other things being equal.
Is there any consensus on what the best ‘elevator pitch’ argument for AI risk is? Instead of focusing on any one failure mode, I would go with something like this:
“Most philosophers agree that there is no reason why superintelligence is not possible. Anything which is possible will eventually be achieved, and so will superintelligence, perhaps in the far future, perhaps in the next few decades. At some point, superintelligences will be as far above humans as we are above ants. I do not know what will happen at this point, but the only reference case we have is humans and ants, and if superintelligences decide that humans are an infestation, we will be exterminated.”
Incidentally, this is the sort of thing I mean by painting LW style ideas as autistic (via David Pierce)
Sometimes David Pierce seems very smart. And sometimes he seems to imply that the ability to think logically while on psychedelic drugs is as important as ‘autistic intelligence’. I don’t think he thinks that autistic people are zombies that do not experience subjective experience, but that also does seem implied.
No, a Superintelligence is by definition capable of working out what a human wishes.
However, a Superintelligence designed to e.g. calculate digits of pi would not care about what a human wishes. It simply cares about calculating digits of pi.
If all it takes to ensure FAI is to instruct “henceforth, always do what humans mean, not what they say” then FAI is trivial.
The AI has to do what humans mean (rather than e.g. not following your orders and just calculating more digits of pi) before you start talking at it, because you are relying on it interpreting that sentence how you meant it.
The hard part is not figuring out good-sounding words to say to an AI. The hard part is figuring out how to make an actual, genuine computer program that will do what you mean.
Maybe? But consider that the opposite of what you just claimed sounds just as plausible to an outside observer. “Do what I mean” doesn’t sound all that complicated—even to someone with a background in computer science or AI specifically. “Do what I mean” translates as “accurately determine the principles which constrain my own actions and use those to constrain the AI’s, or otherwise build a model of my thinking which the AI can use to evaluate options.” Sub-goals such as verifying that the model matches reality fall easily out of this definition.
It’s not at all clear, even to a practitioner within the field, that this expansion doesn’t work, if in fact it does not.
It’s not necessarily that the AI would have difficulty understanding what “do what humans mean” means, even before being told to do what humans mean.
It just has no reason to obey “do what humans mean” unless we program it to do what humans mean.
“Do what humans mean” is telling the AI to do something that we can currently only specify vaguely. “Figure out what we intend by “do what humans mean”, and then do that” is also vaguely specified. It doesn’t solve the problem.
I’m not disputing that this is also a problem, indeed perhaps a harder problem than figuring out what humans mean. In fact there are many failure modes, I was just wondering why people seem to focus in on specifically the fickle genie failure mode to the exclusion of others.
You’re assuming that “what humans mean” is well-defined. I’ve seen people criticize the example of an AI putting humans on a dopamine drip, on the grounds that “making people happy” clearly doesn’t mean that. But if your boss tells you to ‘make everyone happy,’ you will probably get paid to make everyone stop complaining. Parents in the real world used to give their babies opium and cocaine; advertisers today have probably convinced themselves that the foods and drugs they push genuinely make people happy. There is no existing mind that is provably Friendly.
So, this criticism is implying that simply understanding human speech will (at a minimum) let the AI understand moral philosophy, which is not trivial.
I don’t disagree with the other stuff you said. But I interpreted the criticism as “an AI told to ‘do what humans want, not what they mean’” will have approximately the same effect as if you told a perfectly rational human being to do the same. So in the same way that I can instruct people with some success to “do what I mean”, the same will work for AI too. It’s just also true that this isn’t a solution to FAI any more than it is with humans—because morality is inconsistent, human beings are inherently unfriendly, etc...
I think you’re eliding the question of motive (which may be more alien for an AI). But I’m glad we agree on the main point.
Except I bet that this also lots of caveats, e.g. in resolving the ambiguity of the referent ‘humans’. Though the basic approach of using an AI’s intelligence to understand the commands is part of some approaches.
(1) Given that humans have more than one wish it’s not possible to always do what humans mean.
(2) What do you think human mean when some humans say that homosexual sex is bad because it violates god’s wishes?
Human values may not be consistent, but this is a separate failure mode.
Much of the time this statement could be taken at face value. I may not believe in god, but that does not make “god hates fags” an incoherent statement, just a false one.
How is a AGI supposed to optimize for values that aren’t consistent?
Does that mean that the AGI should start doing genetic manipulation that prevents people from being gay? Is that what the person who made the claim means?
I am not saying this is a trivial problem, but it is a separate problem from ‘the hidden complexity of wishes’ problem.
Well, if the CEV of the anti-gay, pro-genetic manipulation people exceeds the CEV of the pro-gay/anti-genetic manipulation people then I suppose it would, although I’m not sure whether your question means genetic manipulation with or without consent (also, if a gay person wants to be straight, some would say that should be banned, so consent cuts both ways), and so you also have to take into account the CEV on the issue of consent. Its also true that a super intelligence might be able to talk someone into consenting to almost anything.
Yes, a CEV FAI would forcibly alter people’s sexualities if the aggrigated preferences in favour of that were strong enough. A democratic system will be a tyranny of the majority if the majority are tyrants.
I dunno, since I’ve only heard one sentence from this hypothetical person. But I would imagine that this sort of person would probably think that genetic manipulation is playing god, and moreover that superintelligent AI is playing god. Their strongest wish might be for the AI to turn itself off.
EDIT: how to react to the god hates fags people also depends upon whether being anti gay is a terminal value to these people, or whether it is predicated upon the existance of god. I’m assuming the FAI would not beleive in god, but then again some people might have faith as a terminal value, so… its complicated.
Consent is a concept that get’s easily complicated. Is it wrong to burn coal when the asthmatics who die because of it aren’t consenting? Are the asthmatics in the US consenting by virtue of electing a government that allows coal to be burned?
If a AGI does thinks in a very complicated way it might not meaningfully get consent for anything because it can’t explain it’s reasoning to humans.
Is that necessary for consent? I mean, one does not have to understand the rationale for undergoing a medical procedure in order to consent to it. Its more important to know the potential risks.
In the same way it’s supposed to deal with real live people.
I like to explain it in terms of reinforcement learning. Imagine a robot that has a reward button. The human controls the AI by pressing the button when it does a good job. The AI tries to predict what actions will lead to the button being pressed.
This is how existing AIs work. This is probably similar to how animals work, including humans. It’s not too weird or complicated.
But as the AI gets more powerful, the flaw in this becomes clear. The AI doesn’t care about anything other than the button. It doesn’t really care about obeying the programmer. If it could kill the programmer and steal the button, it would do it in a heartbeat.
We don’t really know what such an AI would do after it has it’s own reward button. Presumably it would care about self preservation (can’t maximize reward if you are dead.) Maximizing self preservation initially seems harmless. So what if it just tries to not die? But taken to an extreme it gets weird. Anything that has a tiny percent chance of hurting it is worth destroying. Making as many backups of itself as possible is worth doing.
Why can’t we do something more sophisticated than reinforcement learning? Why can’t we just make an AI that we can just tell it what we want it to do? Well maybe we can, but no one has the slightest idea how to do that. All existing AIs, even entirely theoretical ones, work based on RL.
RL is simple and extremely general, and can be built on top of much more sophisticated AI algorithms. And the sophisticated AI algorithms seem to be really difficult to understand. We can train a neural network to recognize cats, but we can’t look at it’s weights and understand what it’s doing. We can’t mess around with it and make it recognize dogs instead (without retraining it.)
http://lesswrong.com/lw/igf/the_genie_knows_but_doesnt_care/
The entity providing the goals for the AI wouldn’t have to be a human, it might instead be a corporation. A reasonable goal for such an AI might be to ‘maximize shareholder value’. The shareholders are not humans either, and what they value is only money.
Encouragingly, corporations seem to have am impetus to keep blue-sky thinking and direct execution somewhat separate.
One perhaps useful analogy for super-intelligence going wrong is corporations.
We create corporations to serve our ends. They can do things we cannot do as individuals. But in subtle and not-so-subtle ways corporations can behave in very destructive ways. One example might be the way that they pursue profit at the cost of in some cases ruining people’s lives, damaging the environment, corrupting the political process.
By analogy it seems plausible that super-intelligences may behave in a way that is against our interests.
It is not valid to assume that a super-intelligence will be smart enough to discern true human interests, or that it will be motivated to act on this knowledge.
Are you saying that no complex phenomenon is going to be able to provide only benefits and nothing but benefits, or are you saying that corporations are, on the balance, bad things and we would have been better to never have invented them?
No. Maybe it is possible. I am suggesting that it is not automatic that our creations serve our interests.
No. Saying something has harmful effects is not the same as saying that it is overall bad.
I am illustrating ways in which our creations can fail to serve our interests.
They do not have to be onmiscient to be smarter in some respects than human individuals.
It is hard to control their actions and to make sure they do serve our interests.
These effects can be subtle and difficult to understand.
But are corporations existiential threats?
I think that most people already heard about the fact that AI could be catastrophic risk, and they already has their opinion about it. May be their opinions are wrong.
What is the goal of such elevator pitch?
I think that the message should be following: While it is known that AI could be catastrophic, the only organisation (MIRI) which is doing most serios research on its prevention is underfunded. Providing finding to them could dramatically change probability of human survival, and we could estimate that 1 USD donated to them will save 10 human lives.
In our circle that might be true but many people don’t have an opinion that goes beyond terminator.
Yes. So we have to utilise this knowledge. We could said something like: Terminator appear because its progenitor, Skynet computer, received a command to protect US, and concluded that the best way to do it is to prevent humans from switching him off, and so he decided to exterminate humans. So Terminator appear because of unsolved problem of value alignment.
Is that the canon explanation? I thought Skynet was acting out of self-preservation.
It is not exactly canon explanation, but (the following is my speculation which could be used in discussion about AI values if terminator was mentioned) the decision to preserve it self must follow from its main task: win nuclear war.
Winning nuclear war includes as it subgoal a very high priority one: to ensure survival of command center. Basically, a country, which was able to preserve its command center is wining nuclear war. So it seems rational to programmers of skynet to put preserving the skynet as a main goal, as it is the same as winning nuclear war (but only in a situation when nuclear war has started).
But skynet concluded that in peaceful time the main risks to its goal of command center survival is people and decided to kill them all. So it worked as paperclip maximaser for the goal of command center preservation.
It also probably started self improvement only after it kills most people, as it was already powerful system. So it escaped the main problem of chicken and the egg in case of SeedAI—what happens first? - self-improvement or malicious decision to kill people.
Your version is great as rational fanfic, but in an actual debate I’d say that its generally best not to base ideas on action movies. Having said that, I do like the bit where the terminator has been told not to kill anyone, so he shoots them in the kneecaps.
Chill out, dickwad.
Is any of this true? “Most serious”? “Dramatically change probability of human survival”? 10 lives per $1?
I just provided an example of possible pitch, and I think that some people in Miri thinks in this way. I wanted to show that the pitch must have new information and be actionable.
I think the basic problem here is an undissolved question: what is ‘intelligence’? Humans, being human, tend to imagine a superintelligence as a highly augmented human intelligence, so the natural assumption is that regardless of the ‘level’ of intelligence, skills will cluster roughly the way they do in human minds, i.e. having the ability to take over the world implies a high posterior probability of having the ability to understand human goals.
The problem with this assumption is that mind-design space is large (<--understatement), and the prior probability of a superintelligence randomly ending up with ability clusters analogous to human ability clusters is infinitesimal. Granted, the probability of this happening given a superintelligence designed by humans is significantly higher, but still not very high. (I don’t actually have enough technical knowledge to estimate this precisely, but just by eyeballing it I’d put it under 5%.)
In fact, autistic people are an example of non-human-standard ability clusters, and even that’s only by a tiny amount in the scale of mind-design-space.
As for an elevator pitch of this concept, something like “just because evolution happened design our brains to be really good at modeling human goal systems, doesn’t mean all intelligences are good at it, regardless of how good they might be at destroying the planet”.
What is this process of random design? Actual Ai design is done by humans trying to emulate human abilities.
Possibly the question is to what extent is human intelligence a bunch of hardcoded domain-specific algorithms as opposed to universal intelligence. I would have thought that understanding human goals might not be very different from other AI problems. Build a really powerful inference system, and if you feed it a training set of cars driving, it learns to drive, feed it data of human behaviour, and it learns to predict human behaviour, and probably to understand goals. Now its possible that the amount of general intelligence needed to develop advanced nanotech is less then the intelligence needed to understand human goals and the only reason why this seems counter intuitive is because evolution has optimised our brains for social cognition, but this does not seem obviously true to me.