No, a Superintelligence is by definition capable of working out what a human wishes.
However, a Superintelligence designed to e.g. calculate digits of pi would not care about what a human wishes. It simply cares about calculating digits of pi.
The AI has to do what humans mean (rather than e.g. not following your orders and just calculating more digits of pi) before you start talking at it, because you are relying on it interpreting that sentence how you meant it.
The hard part is not figuring out good-sounding words to say to an AI. The hard part is figuring out how to make an actual, genuine computer program that will do what you mean.
Maybe? But consider that the opposite of what you just claimed sounds just as plausible to an outside observer. “Do what I mean” doesn’t sound all that complicated—even to someone with a background in computer science or AI specifically. “Do what I mean” translates as “accurately determine the principles which constrain my own actions and use those to constrain the AI’s, or otherwise build a model of my thinking which the AI can use to evaluate options.” Sub-goals such as verifying that the model matches reality fall easily out of this definition.
It’s not at all clear, even to a practitioner within the field, that this expansion doesn’t work, if in fact it does not.
It’s not necessarily that the AI would have difficulty understanding what “do what humans mean” means, even before being told to do what humans mean.
It just has no reason to obey “do what humans mean” unless we program it to do what humans mean.
“Do what humans mean” is telling the AI to do something that we can currently only specify vaguely. “Figure out what we intend by “do what humans mean”, and then do that” is also vaguely specified. It doesn’t solve the problem.
It just has no reason to obey “do what humans mean” unless we program it to do what humans mean.
I’m not disputing that this is also a problem, indeed perhaps a harder problem than figuring out what humans mean. In fact there are many failure modes, I was just wondering why people seem to focus in on specifically the fickle genie failure mode to the exclusion of others.
You’re assuming that “what humans mean” is well-defined. I’ve seen people criticize the example of an AI putting humans on a dopamine drip, on the grounds that “making people happy” clearly doesn’t mean that. But if your boss tells you to ‘make everyone happy,’ you will probably get paid to make everyone stop complaining. Parents in the real world used to give their babies opium and cocaine; advertisers today have probably convinced themselves that the foods and drugs they push genuinely make people happy. There is no existing mind that is provably Friendly.
So, this criticism is implying that simply understanding human speech will (at a minimum) let the AI understand moral philosophy, which is not trivial.
So, this criticism is implying that simply understanding human speech will (at a minimum) let the AI understand moral philosophy, which is not trivial.
I don’t disagree with the other stuff you said. But I interpreted the criticism as “an AI told to ‘do what humans want, not what they mean’” will have approximately the same effect as if you told a perfectly rational human being to do the same. So in the same way that I can instruct people with some success to “do what I mean”, the same will work for AI too. It’s just also true that this isn’t a solution to FAI any more than it is with humans—because morality is inconsistent, human beings are inherently unfriendly, etc...
Except I bet that this also lots of caveats, e.g. in resolving the ambiguity of the referent ‘humans’. Though the basic approach of using an AI’s intelligence to understand the commands is part of some approaches.
If all it takes to ensure FAI is to instruct “henceforth, always do what humans mean, not what they say” then FAI is trivial.
(1) Given that humans have more than one wish it’s not possible to always do what humans mean. (2) What do you think human mean when some humans say that homosexual sex is bad because it violates god’s wishes?
(1) Given that humans have more than one wish it’s not possible to always do what humans mean.
Human values may not be consistent, but this is a separate failure mode.
(2) What do you think human mean when some humans say that homosexual sex is bad because it violates god’s wishes?
Much of the time this statement could be taken at face value. I may not believe in god, but that does not make “god hates fags” an incoherent statement, just a false one.
Human values may not be consistent, but this is a separate failure mode.
How is a AGI supposed to optimize for values that aren’t consistent?
Much of the time this statement could be taken at face value
Does that mean that the AGI should start doing genetic manipulation that prevents people from being gay? Is that what the person who made the claim means?
How is a AGI supposed to optimize for values that aren’t consistent?
I am not saying this is a trivial problem, but it is a separate problem from ‘the hidden complexity of wishes’ problem.
Does that mean that the AGI should start doing genetic manipulation that prevents people from being gay?
Well, if the CEV of the anti-gay, pro-genetic manipulation people exceeds the CEV of the pro-gay/anti-genetic manipulation people then I suppose it would, although I’m not sure whether your question means genetic manipulation with or without consent (also, if a gay person wants to be straight, some would say that should be banned, so consent cuts both ways), and so you also have to take into account the CEV on the issue of consent. Its also true that a super intelligence might be able to talk someone into consenting to almost anything.
Yes, a CEV FAI would forcibly alter people’s sexualities if the aggrigated preferences in favour of that were strong enough. A democratic system will be a tyranny of the majority if the majority are tyrants.
Is that what the person who made the claim means?
I dunno, since I’ve only heard one sentence from this hypothetical person. But I would imagine that this sort of person would probably think that genetic manipulation is playing god, and moreover that superintelligent AI is playing god. Their strongest wish might be for the AI to turn itself off.
EDIT: how to react to the god hates fags people also depends upon whether being anti gay is a terminal value to these people, or whether it is predicated upon the existance of god. I’m assuming the FAI would not beleive in god, but then again some people might have faith as a terminal value, so… its complicated.
and so you also have to take into account the CEV on the issue of consent. Its also true that a super intelligence might be able to talk someone into consenting to almost anything.
Consent is a concept that get’s easily complicated. Is it wrong to burn coal when the asthmatics who die because of it aren’t consenting? Are the asthmatics in the US consenting by virtue of electing a government that allows coal to be burned?
If a AGI does thinks in a very complicated way it might not meaningfully get consent for anything because it can’t explain it’s reasoning to humans.
If a AGI does thinks in a very complicated way it might not meaningfully get consent for anything because it can’t explain it’s reasoning to humans.
Is that necessary for consent? I mean, one does not have to understand the rationale for undergoing a medical procedure in order to consent to it. Its more important to know the potential risks.
No, a Superintelligence is by definition capable of working out what a human wishes.
However, a Superintelligence designed to e.g. calculate digits of pi would not care about what a human wishes. It simply cares about calculating digits of pi.
If all it takes to ensure FAI is to instruct “henceforth, always do what humans mean, not what they say” then FAI is trivial.
The AI has to do what humans mean (rather than e.g. not following your orders and just calculating more digits of pi) before you start talking at it, because you are relying on it interpreting that sentence how you meant it.
The hard part is not figuring out good-sounding words to say to an AI. The hard part is figuring out how to make an actual, genuine computer program that will do what you mean.
Maybe? But consider that the opposite of what you just claimed sounds just as plausible to an outside observer. “Do what I mean” doesn’t sound all that complicated—even to someone with a background in computer science or AI specifically. “Do what I mean” translates as “accurately determine the principles which constrain my own actions and use those to constrain the AI’s, or otherwise build a model of my thinking which the AI can use to evaluate options.” Sub-goals such as verifying that the model matches reality fall easily out of this definition.
It’s not at all clear, even to a practitioner within the field, that this expansion doesn’t work, if in fact it does not.
It’s not necessarily that the AI would have difficulty understanding what “do what humans mean” means, even before being told to do what humans mean.
It just has no reason to obey “do what humans mean” unless we program it to do what humans mean.
“Do what humans mean” is telling the AI to do something that we can currently only specify vaguely. “Figure out what we intend by “do what humans mean”, and then do that” is also vaguely specified. It doesn’t solve the problem.
I’m not disputing that this is also a problem, indeed perhaps a harder problem than figuring out what humans mean. In fact there are many failure modes, I was just wondering why people seem to focus in on specifically the fickle genie failure mode to the exclusion of others.
You’re assuming that “what humans mean” is well-defined. I’ve seen people criticize the example of an AI putting humans on a dopamine drip, on the grounds that “making people happy” clearly doesn’t mean that. But if your boss tells you to ‘make everyone happy,’ you will probably get paid to make everyone stop complaining. Parents in the real world used to give their babies opium and cocaine; advertisers today have probably convinced themselves that the foods and drugs they push genuinely make people happy. There is no existing mind that is provably Friendly.
So, this criticism is implying that simply understanding human speech will (at a minimum) let the AI understand moral philosophy, which is not trivial.
I don’t disagree with the other stuff you said. But I interpreted the criticism as “an AI told to ‘do what humans want, not what they mean’” will have approximately the same effect as if you told a perfectly rational human being to do the same. So in the same way that I can instruct people with some success to “do what I mean”, the same will work for AI too. It’s just also true that this isn’t a solution to FAI any more than it is with humans—because morality is inconsistent, human beings are inherently unfriendly, etc...
I think you’re eliding the question of motive (which may be more alien for an AI). But I’m glad we agree on the main point.
Except I bet that this also lots of caveats, e.g. in resolving the ambiguity of the referent ‘humans’. Though the basic approach of using an AI’s intelligence to understand the commands is part of some approaches.
(1) Given that humans have more than one wish it’s not possible to always do what humans mean.
(2) What do you think human mean when some humans say that homosexual sex is bad because it violates god’s wishes?
Human values may not be consistent, but this is a separate failure mode.
Much of the time this statement could be taken at face value. I may not believe in god, but that does not make “god hates fags” an incoherent statement, just a false one.
How is a AGI supposed to optimize for values that aren’t consistent?
Does that mean that the AGI should start doing genetic manipulation that prevents people from being gay? Is that what the person who made the claim means?
I am not saying this is a trivial problem, but it is a separate problem from ‘the hidden complexity of wishes’ problem.
Well, if the CEV of the anti-gay, pro-genetic manipulation people exceeds the CEV of the pro-gay/anti-genetic manipulation people then I suppose it would, although I’m not sure whether your question means genetic manipulation with or without consent (also, if a gay person wants to be straight, some would say that should be banned, so consent cuts both ways), and so you also have to take into account the CEV on the issue of consent. Its also true that a super intelligence might be able to talk someone into consenting to almost anything.
Yes, a CEV FAI would forcibly alter people’s sexualities if the aggrigated preferences in favour of that were strong enough. A democratic system will be a tyranny of the majority if the majority are tyrants.
I dunno, since I’ve only heard one sentence from this hypothetical person. But I would imagine that this sort of person would probably think that genetic manipulation is playing god, and moreover that superintelligent AI is playing god. Their strongest wish might be for the AI to turn itself off.
EDIT: how to react to the god hates fags people also depends upon whether being anti gay is a terminal value to these people, or whether it is predicated upon the existance of god. I’m assuming the FAI would not beleive in god, but then again some people might have faith as a terminal value, so… its complicated.
Consent is a concept that get’s easily complicated. Is it wrong to burn coal when the asthmatics who die because of it aren’t consenting? Are the asthmatics in the US consenting by virtue of electing a government that allows coal to be burned?
If a AGI does thinks in a very complicated way it might not meaningfully get consent for anything because it can’t explain it’s reasoning to humans.
Is that necessary for consent? I mean, one does not have to understand the rationale for undergoing a medical procedure in order to consent to it. Its more important to know the potential risks.
In the same way it’s supposed to deal with real live people.