Honestly, I would love to hear your arguments against this notion.
It’s completely divorced from reality:
Once turned on, AGI will simply outsmart people in every way.
How? By what mechanism? An artificial intelligence is not a magical oracle. It arrives at its own plan of action by some deterministic algorithm running on the data available to it. An intelligence that is not programmed for social awareness will not suddenly be able to outsmart, outthink, and outmaneuver its human caretakers the moment it crosses some takeoff boundary. Without being programmed to have such capability from the start, and without doing something stupid like connecting it directly to the Internet, how is an AI supposed to develop that capability on its own without a detectable process of data collection by trial and error?
Nobody gets a free card to say “the AGI will simply outsmart people in every way.” You have to explain precisely how such capability would exist. So far, all that I’ve seen is unclear, hand-wavy arguments by analogy that are completely unsatisfactory in that regard. “Because super-intelligence!” is not an answer.
Once turned on, we can’t do anything about it.
We could, I don’t know, pull the plug.
it just does whatever it wants and can
How, unless it is given effectors in the real world? Why would we be stupid enough to do that?
and we won’t be able to control it anymore
If we started with an ability to control it, how did we lose that ability?
as [we] simply won’t be able to quickly come up decision better or even on par with it.
Turn it off. Take as long as you want to evaluate the data and make your decision. Then turn it back on again. Or not.
“Why would we be stupid enough to do that.?” For the same reason we give automatic trading software “effectors” to make trades on real world markets. For the same reason we have robot arms in factories assembling cars. For the same reason Google connects its machine learning algorithms directly to the internet. BECAUSE IT IS PROFITABLE.
People don’t want to build an AI just to keep it in a box. People want AI to do stuff for them, and in order for it to be profitable, they will want AI to do stuff faster and more effectively than a human. If it’s not worrying to you because you think people will be cautious, and not give the AI any ability to affect the world, and be instantly ready to turn off their multi-billion dollar research project, you are ASSUMING a level of caution that MIRI is trying to promote! You’ve already bought their argument, and think everyone else has!
What type of evidence would make you think it’s more likely that a self-modifying AGI could “break out of the box”?
I want to understand your resistance to thought-experiments. Are all thought-experiments detached from reality in your book? Are all analogies detached from reality? Would you ever feel like you understood something better and thus change your views because of an analogy or story? How could something like http://lesswrong.com/lw/qk/that_alien_message/ be different in a way that you would theoretically find persuasive?
Perhaps you’re saying that people’s confidence is too strong just based on analogy?
I was going to try and address your comment directly, but thought it’d be a good idea to sort that out first, because of course there are no studies of how AGI behave.
What lesson am I supposed to learn from “That Alien Message”? It’s a work of fiction. You do not generalize from fictional evidence. Maybe I should write a story about how slow a takeoff would be given the massive inefficiencies of present technology, all the trivial and mundane ways an AI in the midst of a takeoff would get tripped up and caught, and all the different ways redundant detection mechanisms, honey pots, and fail safe contraptions would prevent existential risk scenarios? But such a work of fiction would be just as invalid as evidence.
Ok, I’m still confused as to many of my questions, but let me see if this bit sounds right: the only parameter via which something like “That Alien Message” could become more persuasive to you is by being less fictional. Fictional accounts of anything will NEVER cause you to update your beliefs. Does that sound right?
If that’s right, then I want to suggest why such things should sometimes be persuasive. A perfect Bayesian reasoner with finite computational ability operates not just with uncertainty about the outside world, but also with logical uncertainty as to consequences of their beliefs. So as humans, we operate with at least that difficulty when dealing with our own beliefs. In practice we deal with much much much worse.
I believe the correct form of the deduction your trying to make is “don’t add a fictional story to the reference class of a real analogue for purposes of figuring your beliefs”, and I agree. However, there are other ways a fictional story can be persuasive and should (in my view) cause you to update your beliefs:
It illustrates a new correct deduction which you weren’t aware of before, whose consequences you then begin working out.
It reminds you of a real experience you’ve had, which was not present in your mind before, whose existence then figures into your reference classes.
It changes your emotional attitude toward something, indirectly changing your beliefs by causing you to reflect on that thing differently in the future.
Some of these are subject to biases which would need correcting to move toward better reasoning, but I perceive you as claiming that these should have no impact, ever. Am I interpreting that correctly(I’m going to guess that I’m not somewhere), and if so why do you think that?
I think it’s a pretty big assumption to assume that fictional stories typically do those things correctly. Fictional stories are, after all, produced by people with agendas. If the proportion of fictional stories with plausible but incorrect deductions, reminders, or reflections is big enough, even your ability to figure out which ones are correct might not make it worthwhile to use fiction this way.
(Consider an extreme case where you can correctly assess 95% of the time whether a fictional deduction, reminder, or reflection is correct, but they are incorrect at a 99% rate. You’d have about a 4⁄5 chance of being wrong if you update based on fiction.)
Agreed; you’d have to figure all of that out separately. For what it’s worth, given the selection of fictional stories I’m usually exposed to and decide to read, I think they’re generally positive value (though probably not the best in terms of opportunity cost.)
If a story or thought experiment prompts you to think of some existing data you hadn’t paid attention to, and realize that data was not anticipated by your present beliefs, then that data acts as evidence for updating beliefs. The story or thought experiment was merely a reference used to call attention to that data.
“Changing your emotional attitude” as far as I can tell is actually cache-flushing. It does not change your underlying beliefs, it just trains your emotional response to align with those beliefs, eliminating inconsistencies in thought.
I’m not sure where “that alien message” is supposed to lie in either of those two categories. It makes no reference to actual experimental data which I may not have been paying attention to, nor do I detect any inconsistency it is unraveling. Rather, it makes a ton of assumptions and then runs with those assumptions, when in fact those assumptions were not valid in the first place. It’s a cathedral built on sand.
Basically agreed on paragraph 1, but I do want to suggest that then we not say “I will never update on fictional stories.” Taken naively, you then might avoid fictional stories because they’re useless (“I never update on them!”), when of course they might be super useful if they cause you to pull up relevant experiences quite often.
I’ll make an example of how “That Alien Message” could do for me what I illustrated in my 1st bullet point. I think, “Oh, it seems very unlikely that an AI could break out of a box, you just have this shutoff switch and watch it closely and …”. Then That Alien Message suggests the thought experiment of “instead of generalizing over all AI, instead imagine just a highly specific type of AI that may or may not reasonably come to exist: a bunch of really sped up, smart humans.” Then it sparks my thinking for what a bunch of really sped up, smart humans could accomplish with even narrow channels of communication. Then I think “actually, though I’ve seen no new instances of the AI reference class in reality, I now reason differently about how a possible AI could behave since that class (of possibilities) includes the the thing I just thought of. Until I get a lot more information about how that class behaves in reality, I’m going to be a lot more cautious.”
By picking out a specific possible example, it illustrates that my thinking around the possible AI reference class wasn’t expansive enough. This could help break through, for example, an accessibility heuristic: when I think of a random AI, I think of my very concrete vision of how such an AI would behave, instead of really trying to think about what could lie in that huge space.
Perhaps you are already appropriately cautious, and this story sparks/sparked no new thoughts in you, or you have a good reason to believe that communities of sped up humans or anything at least as powerful are excluded from the reference space, or the reference space you care about is narrower, but it seemed like you were making a stronger statement that such stories will never have any impact on you.
Creative / clever thinking is good. It’s where new ideas come from. Practicing creative thinking by reading interesting stories is not a waste of time. Updating based on creative / clever thoughts, on the other hand, is a mistake. The one almost-exception I can think of is “X is impossible!” where a clever plan for doing X, even if not actually implemented, suffices as weak evidence that “X is impossible!” is false. Or rather, it should propagate up your belief hierarchy and make you revisit why you thought X was impossible in the first place. Because the two remaining options are: (1) you were mistaken about the plausibility of X, or (2) this clever new hypothetical is not so clever—it rests on hidden assumptions that turn out to be false. Either way you are stuck testing your own assumptions and/or the hypothesis’ assumptions before making that ruling.
The trouble is, most people tend to just assume (1). I don’t know if there is a name for this heuristic, but it does lead to bias.
Your arguments rests on trying to be clever, which Mark rejected as a means of gathering knowledge.
Do you have empiric evidence that there are cases where people did well by updating after reading fictional stories?
Are there any studies that suggest that people who update through fictional stories do better?
Studies, no. I can’t imagine studies existing today that resolve this, which of course is a huge failure of imagination: that’s a really good thing to think about. For anything high enough level, I expect to run into problems with “do better”, such as “do better at predicting the behavior of AGI” being an accessible category. I would be very excited if there were nearby categories that we could get our hands on though; I expect this is similar to the problem of developing and testing a notion of “Rationality quotient” and proving it’s effectiveness.
I’m not sure where you’re referring to with Mark rejecting cleverness as a way of gathering knowledge, but I think we may be arguing about what the human equivalent of logical uncertainty looks like? What’s the difference in this case between “cleverness” and “thinking”? (Also could you point me to the place you were talking about?)
I guess I usually think of cleverness with the negative connotation being “thinking in too much detail with a blind spot”. So could you say which part of my response you think is bad thinking, or what you instead mean by cleverness?
I see. I do think you can update based on thinking; the human analogue of the logical uncertainty I was talking about. As an aspiring mathematician, this is what I think the practice of mathematics looks, for instance.
I understand the objection that this process may fail in real life, or lead to worse outcomes, since our models aren’t purely formal and our reasoning isn’t either purely deductive or optimally Bayesian. It looks like some others have made some great comments to this article also discussing that.
I’m just confused about which thinking you’re considering bad. I’m sure I’m not understanding, because it sounds to me like “the thinking which is thinking, and not direct empirical observation.” There’s got to be some level above direct empirical observation, or you’re just an observational rock. The internal process you have which at any level approximates Bayesian reasoning is a combination of your unconscious processing and your conscious thinking.
I’m used to people picking apart arguments. I’m used to heuristics that say, “hey you’ve gone too far with abstract thinking here, and here’s an empirical way to settle it, or here’s an argument for why your abstract thinking has gone too far and you should wait for empirical evidence or do X to seek some out” But I’m not used to “your mistake was abstract thinking at all; you can do nothing but empirically observe to gain a new state of understanding”, at least with regard to things like this. I feel like I’m caricaturing, but there’s a big blank when I try to figure out what else is being said.
There are two ways you can to do reasoning.
1) You build a theory about Bayesian updating and how it should work.
2) You run studies of how humans reasons and when they reason successfully. You identify when and how human reason correctly.
If I would argue that taking a specific drug helps you with illness X, the only argument you would likely accept is an empiric study. That’s independent with whether or not you can find a flaw in casual reason of why I think drug X should help with an illness. At least if you believe in evidence-based medicine.
The reason is that in the past theory based arguments often turned out to be wrong in the field of medicine.
We don’t live in a time where we don’t have anyone doing decision science. Whether or not people are simply blinded by fiction or whether it helps reasoning is an empirical question.
Ok, I think see the core of what you’re talking about, especially “Whether or not people are simply blinded by fiction or whether it helps reasoning is an empirical question.” This sounds like an outside view versus inside view distinction: I’ve been focused on “What should my inside view look like” and using outside view tools to modify that when possible (such as knowledge of a bias from decision science.) I think you and maybe Mark are trying to say “the inside view is useless or counter-productive here; only the outside view will be of any use” so that in the absence of outside view evidence, we should simply not attempt to reason further unless it’s a super-clear case, like Mark illustrates in his other comment.
My intuition is that this is incorrect, but it reminds me of the Hanson-Yudkowsky debates on outside vs. weak inside view, and I think I don’t have a strong enough grasp to clarify my intuition sufficiently right now. I’m going to try and pay serious attention to this issue in the future though, and would appreciate if you have any references that you think might clarify.
It’s not only outside vs. inside view. It’s knowing things is really hard. Humans are by nature overconfident. Life isn’t fair. The fact that empiric evidence is hard to get doesn’t make theoretical reasoning about the issue any more likely to be correct.
I rather trust a doctor with medical experience (has an inside view) to translate empirical studies in a way that applies directly to me than someone who reasons simply based on reading the study and who has no medical experience.
I do sin from time from time and act overconfident. But that doesn’t mean it’s right. Skepticism is a virtue. I like Foersters book “Truth is the invention of a liar” (unfortunately that book is in German, and I haven’t read other writing by him).
It doesn’t really gives answers but it makes the unknowing more graspable.
It’d be invalid as evidence, but it might still give a felt sense of your ideas that helps appreciate them. Discussions of AI, like discussions of aliens, have always been drawing on fiction at least for illustration. I for one would love to see that story.
I t occurs to me that an AI could be smart enough to win without being smarter in every way or winning every conflict. Admittedly, this is a less dramatic claim.
Once turned on, AGI will simply outsmart people in every way.
How? By what mechanism? An artificial intelligence is not a magical oracle
It’s true by definition that a superintelligent AI will be able to outsmart humans at some things, so I guess you are objecting to the “every way”...
Once turned on, we can’t do anything about it.
We could, I don’t know, pull the plug.
“Please dont unplug me, I am about to find a cure for cancer”
MIRI has a selection of arguments for how an AI could un box itself, and they are based in the AIs knowledge of human language, values and psychology. Whether it could outsmart us in every way isnt relevant....what is relevant is whether it has those kinds of knowledge,
There are ways in which an AI outside of a box could get hold of those kinds of knowledge...but only if it is already unboxed. Otherwise it has chicken and egg problem, the problem of getting enough social engineering knowledge whilst inside the box to talk it’s way out....and it is knowledge, not the sort of thing you can figure out from first principles.
MIRI seems to think it is likely that a super AI would be preloaded with knowledge of human values because we would want it to agentively make the world a better place...in other words, the worst case scenario is very close to the best case scenario, is a near miss from the best case scenario. And the whole problem is easily .sidestepped by aiming a bit lower, eg for tool AI.
just does whatever it wants and canHow, unless it is given effectors in the real world?
Pure information can be dangerous. Consider an AI that generates a formula for a vaccine which is supposed to protect against cancer, but actually makes everyone sterile...
“Please dont unplug me, I am about to find a cure for cancer”
If that happened I would eat my hat. Rush to push the big red STOP button, pull the hard-cutoff electrical lever, break glass on the case containing firearms & explosives, and then sit down and eat my hat.
Maybe I’m off here, but common sense tells me that if you are worried about AI takeoffs, and if you are tasking an AI with obscure technical problems like designing construction processes for first generation nanomachines, large scale data mining in support of the SENS research objectives, or plain old long-term financial projections, you don’t build in skills or knowledge that is not required. Such a machine does not need a theory of other minds. It may need to parse and understand scientific literature, but has no need to understand the social cues of persuasive language, our ethical value systems, or psychology. It certainly doesn’t need a camera pointed at me as would be required to even know I’m about to pull the plug.
Of course I’m saying the same thing you are in different words. I know you and I basically see eye-to-eye on this.
Pure information can be dangerous. Consider an AI that generates a formula for a vaccine which is supposed to protect against cancer, but actually makes everyone sterile...
This could happen by accident. Any change to the human body has side effects. The quest for finding a clinical treatment is locating an intervention whose side effects are a net benefit, which requires at least some understanding of quality of life. It could even be a desirable outcome, vs the null result. I would gladly have a vaccine that protects against cancer but actually makes the patient sterile. Just freeze their eggs, or give it to people over 45 with family history electively.
What’s crazy is the notion that the machine lacking any knowledge about humans mentioned above could purposefully engineer such an elaborate deception to achieve a hidden purpose, all while its programmers are overseeing its goal system looking for the patterns of deceptive goal states. It’d have to be not just deceptive, but meta-deceptive. At some point the problem is just so over-constrained as to be on the level of Boltzmann-brain improbable, super-intelligence or no.
“Please dont unplug me, I am about to find a cure for cancer”I
f that happened I would eat my hat. Rush to push the big red STOP button, pull the yhard-cutoff electrical lever, break glass on the case containing firearms & explosives, and then sit down and eat my hat.
And then be reviled as the man who prevented a cure for cancer? Remember that the you in the story doesn’t have the same information as the you outside the story—he doesn’t know that the AI isnt sincere.
“Please dont unplug me, I am about to find a cure for cancer” is a .placeholder for a class of exploits on the part of the AI where it holds a carrot in front of us. It’s not going to literally come out with the cure for cancer thing under circumstances where it’s not tasked with working on something like it it, because that would be dumb , and it’s supposed to be superintelligent. But superintelligence is really difficult to predict....you have to imagine exploits, then imagine versions of them that are much better.
Maybe I’m off here, but common sense tells me that if you are worried about AI takeoffs, and if you are tasking an AI with obscure technical problems like designing construction processes for first generation nanomachines, large scale data mining in support of the SENS research objectives, or plain old long-term financial projections, you don’t build in skills or knowledge that is not required.
The hypothetical MIRI is putting forward is that if you task an super AI with agentively solving the whole of human happiness, then it will have to have the kind of social, psychological and linguistic knowledge necessary to talk its way out of the box.
A more specialised AGI seems safer… and likelier … but then another danger kicks in: it’s creators might be too relaxed about boxing it, perhaps allowing it to internet access… but the internet contains a wealth of information to bootstrap linguistic and psychological knowledge with.
There’s an important difference between rejecting MIRIs hypotheticals because the conclusions don’t follow from the antecedents, as opposed to doing so because the antecedents are unlikely in the place.
This could happen accident.
Dangers arising from non AI scenario don’t prove AI safety. My point was that an AI doesn’t need efffectors to be dangerous… information plus sloppy oversight is enough. However the MIRI scenario seems to require a kind of perfect storm of fast takeoff , overambition, poor oversight, etc.
all while its programmers are overseeing its goal system looking for the patterns of deceptive goal states. It’d have to be not just deceptive, but meta-deceptive.
A superintelligence can be meta deceptive. Direct inspection of code is a terrible method of oversight, since even simple AIs can work in ways that baffle human programmers.
ETA on the whole, I object to the antecedents/priors ….I think the hypothetical go through,
It’s completely divorced from reality:
How? By what mechanism? An artificial intelligence is not a magical oracle. It arrives at its own plan of action by some deterministic algorithm running on the data available to it. An intelligence that is not programmed for social awareness will not suddenly be able to outsmart, outthink, and outmaneuver its human caretakers the moment it crosses some takeoff boundary. Without being programmed to have such capability from the start, and without doing something stupid like connecting it directly to the Internet, how is an AI supposed to develop that capability on its own without a detectable process of data collection by trial and error?
Nobody gets a free card to say “the AGI will simply outsmart people in every way.” You have to explain precisely how such capability would exist. So far, all that I’ve seen is unclear, hand-wavy arguments by analogy that are completely unsatisfactory in that regard. “Because super-intelligence!” is not an answer.
We could, I don’t know, pull the plug.
How, unless it is given effectors in the real world? Why would we be stupid enough to do that?
If we started with an ability to control it, how did we lose that ability?
Turn it off. Take as long as you want to evaluate the data and make your decision. Then turn it back on again. Or not.
“Why would we be stupid enough to do that.?” For the same reason we give automatic trading software “effectors” to make trades on real world markets. For the same reason we have robot arms in factories assembling cars. For the same reason Google connects its machine learning algorithms directly to the internet. BECAUSE IT IS PROFITABLE.
People don’t want to build an AI just to keep it in a box. People want AI to do stuff for them, and in order for it to be profitable, they will want AI to do stuff faster and more effectively than a human. If it’s not worrying to you because you think people will be cautious, and not give the AI any ability to affect the world, and be instantly ready to turn off their multi-billion dollar research project, you are ASSUMING a level of caution that MIRI is trying to promote! You’ve already bought their argument, and think everyone else has!
What type of evidence would make you think it’s more likely that a self-modifying AGI could “break out of the box”?
I want to understand your resistance to thought-experiments. Are all thought-experiments detached from reality in your book? Are all analogies detached from reality? Would you ever feel like you understood something better and thus change your views because of an analogy or story? How could something like http://lesswrong.com/lw/qk/that_alien_message/ be different in a way that you would theoretically find persuasive?
Perhaps you’re saying that people’s confidence is too strong just based on analogy?
I was going to try and address your comment directly, but thought it’d be a good idea to sort that out first, because of course there are no studies of how AGI behave.
What lesson am I supposed to learn from “That Alien Message”? It’s a work of fiction. You do not generalize from fictional evidence. Maybe I should write a story about how slow a takeoff would be given the massive inefficiencies of present technology, all the trivial and mundane ways an AI in the midst of a takeoff would get tripped up and caught, and all the different ways redundant detection mechanisms, honey pots, and fail safe contraptions would prevent existential risk scenarios? But such a work of fiction would be just as invalid as evidence.
Ok, I’m still confused as to many of my questions, but let me see if this bit sounds right: the only parameter via which something like “That Alien Message” could become more persuasive to you is by being less fictional. Fictional accounts of anything will NEVER cause you to update your beliefs. Does that sound right?
If that’s right, then I want to suggest why such things should sometimes be persuasive. A perfect Bayesian reasoner with finite computational ability operates not just with uncertainty about the outside world, but also with logical uncertainty as to consequences of their beliefs. So as humans, we operate with at least that difficulty when dealing with our own beliefs. In practice we deal with much much much worse.
I believe the correct form of the deduction your trying to make is “don’t add a fictional story to the reference class of a real analogue for purposes of figuring your beliefs”, and I agree. However, there are other ways a fictional story can be persuasive and should (in my view) cause you to update your beliefs:
It illustrates a new correct deduction which you weren’t aware of before, whose consequences you then begin working out.
It reminds you of a real experience you’ve had, which was not present in your mind before, whose existence then figures into your reference classes.
It changes your emotional attitude toward something, indirectly changing your beliefs by causing you to reflect on that thing differently in the future.
Some of these are subject to biases which would need correcting to move toward better reasoning, but I perceive you as claiming that these should have no impact, ever. Am I interpreting that correctly(I’m going to guess that I’m not somewhere), and if so why do you think that?
I think it’s a pretty big assumption to assume that fictional stories typically do those things correctly. Fictional stories are, after all, produced by people with agendas. If the proportion of fictional stories with plausible but incorrect deductions, reminders, or reflections is big enough, even your ability to figure out which ones are correct might not make it worthwhile to use fiction this way.
(Consider an extreme case where you can correctly assess 95% of the time whether a fictional deduction, reminder, or reflection is correct, but they are incorrect at a 99% rate. You’d have about a 4⁄5 chance of being wrong if you update based on fiction.)
Agreed; you’d have to figure all of that out separately. For what it’s worth, given the selection of fictional stories I’m usually exposed to and decide to read, I think they’re generally positive value (though probably not the best in terms of opportunity cost.)
If a story or thought experiment prompts you to think of some existing data you hadn’t paid attention to, and realize that data was not anticipated by your present beliefs, then that data acts as evidence for updating beliefs. The story or thought experiment was merely a reference used to call attention to that data.
“Changing your emotional attitude” as far as I can tell is actually cache-flushing. It does not change your underlying beliefs, it just trains your emotional response to align with those beliefs, eliminating inconsistencies in thought.
I’m not sure where “that alien message” is supposed to lie in either of those two categories. It makes no reference to actual experimental data which I may not have been paying attention to, nor do I detect any inconsistency it is unraveling. Rather, it makes a ton of assumptions and then runs with those assumptions, when in fact those assumptions were not valid in the first place. It’s a cathedral built on sand.
Basically agreed on paragraph 1, but I do want to suggest that then we not say “I will never update on fictional stories.” Taken naively, you then might avoid fictional stories because they’re useless (“I never update on them!”), when of course they might be super useful if they cause you to pull up relevant experiences quite often.
I’ll make an example of how “That Alien Message” could do for me what I illustrated in my 1st bullet point. I think, “Oh, it seems very unlikely that an AI could break out of a box, you just have this shutoff switch and watch it closely and …”. Then That Alien Message suggests the thought experiment of “instead of generalizing over all AI, instead imagine just a highly specific type of AI that may or may not reasonably come to exist: a bunch of really sped up, smart humans.” Then it sparks my thinking for what a bunch of really sped up, smart humans could accomplish with even narrow channels of communication. Then I think “actually, though I’ve seen no new instances of the AI reference class in reality, I now reason differently about how a possible AI could behave since that class (of possibilities) includes the the thing I just thought of. Until I get a lot more information about how that class behaves in reality, I’m going to be a lot more cautious.”
By picking out a specific possible example, it illustrates that my thinking around the possible AI reference class wasn’t expansive enough. This could help break through, for example, an accessibility heuristic: when I think of a random AI, I think of my very concrete vision of how such an AI would behave, instead of really trying to think about what could lie in that huge space.
Perhaps you are already appropriately cautious, and this story sparks/sparked no new thoughts in you, or you have a good reason to believe that communities of sped up humans or anything at least as powerful are excluded from the reference space, or the reference space you care about is narrower, but it seemed like you were making a stronger statement that such stories will never have any impact on you.
Creative / clever thinking is good. It’s where new ideas come from. Practicing creative thinking by reading interesting stories is not a waste of time. Updating based on creative / clever thoughts, on the other hand, is a mistake. The one almost-exception I can think of is “X is impossible!” where a clever plan for doing X, even if not actually implemented, suffices as weak evidence that “X is impossible!” is false. Or rather, it should propagate up your belief hierarchy and make you revisit why you thought X was impossible in the first place. Because the two remaining options are: (1) you were mistaken about the plausibility of X, or (2) this clever new hypothetical is not so clever—it rests on hidden assumptions that turn out to be false. Either way you are stuck testing your own assumptions and/or the hypothesis’ assumptions before making that ruling.
The trouble is, most people tend to just assume (1). I don’t know if there is a name for this heuristic, but it does lead to bias.
Your arguments rests on trying to be clever, which Mark rejected as a means of gathering knowledge.
Do you have empiric evidence that there are cases where people did well by updating after reading fictional stories? Are there any studies that suggest that people who update through fictional stories do better?
This seems promising!
Studies, no. I can’t imagine studies existing today that resolve this, which of course is a huge failure of imagination: that’s a really good thing to think about. For anything high enough level, I expect to run into problems with “do better”, such as “do better at predicting the behavior of AGI” being an accessible category. I would be very excited if there were nearby categories that we could get our hands on though; I expect this is similar to the problem of developing and testing a notion of “Rationality quotient” and proving it’s effectiveness.
I’m not sure where you’re referring to with Mark rejecting cleverness as a way of gathering knowledge, but I think we may be arguing about what the human equivalent of logical uncertainty looks like? What’s the difference in this case between “cleverness” and “thinking”? (Also could you point me to the place you were talking about?)
I guess I usually think of cleverness with the negative connotation being “thinking in too much detail with a blind spot”. So could you say which part of my response you think is bad thinking, or what you instead mean by cleverness?
It’s detached from empirical observation. It rests on the assumption that one can gather knowledge by reasoning itself (i.e. being clever).
I see. I do think you can update based on thinking; the human analogue of the logical uncertainty I was talking about. As an aspiring mathematician, this is what I think the practice of mathematics looks, for instance.
I understand the objection that this process may fail in real life, or lead to worse outcomes, since our models aren’t purely formal and our reasoning isn’t either purely deductive or optimally Bayesian. It looks like some others have made some great comments to this article also discussing that.
I’m just confused about which thinking you’re considering bad. I’m sure I’m not understanding, because it sounds to me like “the thinking which is thinking, and not direct empirical observation.” There’s got to be some level above direct empirical observation, or you’re just an observational rock. The internal process you have which at any level approximates Bayesian reasoning is a combination of your unconscious processing and your conscious thinking.
I’m used to people picking apart arguments. I’m used to heuristics that say, “hey you’ve gone too far with abstract thinking here, and here’s an empirical way to settle it, or here’s an argument for why your abstract thinking has gone too far and you should wait for empirical evidence or do X to seek some out” But I’m not used to “your mistake was abstract thinking at all; you can do nothing but empirically observe to gain a new state of understanding”, at least with regard to things like this. I feel like I’m caricaturing, but there’s a big blank when I try to figure out what else is being said.
There are two ways you can to do reasoning. 1) You build a theory about Bayesian updating and how it should work. 2) You run studies of how humans reasons and when they reason successfully. You identify when and how human reason correctly.
If I would argue that taking a specific drug helps you with illness X, the only argument you would likely accept is an empiric study. That’s independent with whether or not you can find a flaw in casual reason of why I think drug X should help with an illness. At least if you believe in evidence-based medicine. The reason is that in the past theory based arguments often turned out to be wrong in the field of medicine.
We don’t live in a time where we don’t have anyone doing decision science. Whether or not people are simply blinded by fiction or whether it helps reasoning is an empirical question.
Ok, I think see the core of what you’re talking about, especially “Whether or not people are simply blinded by fiction or whether it helps reasoning is an empirical question.” This sounds like an outside view versus inside view distinction: I’ve been focused on “What should my inside view look like” and using outside view tools to modify that when possible (such as knowledge of a bias from decision science.) I think you and maybe Mark are trying to say “the inside view is useless or counter-productive here; only the outside view will be of any use” so that in the absence of outside view evidence, we should simply not attempt to reason further unless it’s a super-clear case, like Mark illustrates in his other comment.
My intuition is that this is incorrect, but it reminds me of the Hanson-Yudkowsky debates on outside vs. weak inside view, and I think I don’t have a strong enough grasp to clarify my intuition sufficiently right now. I’m going to try and pay serious attention to this issue in the future though, and would appreciate if you have any references that you think might clarify.
It’s not only outside vs. inside view. It’s knowing things is really hard. Humans are by nature overconfident. Life isn’t fair. The fact that empiric evidence is hard to get doesn’t make theoretical reasoning about the issue any more likely to be correct.
I rather trust a doctor with medical experience (has an inside view) to translate empirical studies in a way that applies directly to me than someone who reasons simply based on reading the study and who has no medical experience.
I do sin from time from time and act overconfident. But that doesn’t mean it’s right. Skepticism is a virtue. I like Foersters book “Truth is the invention of a liar” (unfortunately that book is in German, and I haven’t read other writing by him). It doesn’t really gives answers but it makes the unknowing more graspable.
It’d be invalid as evidence, but it might still give a felt sense of your ideas that helps appreciate them. Discussions of AI, like discussions of aliens, have always been drawing on fiction at least for illustration. I for one would love to see that story.
I t occurs to me that an AI could be smart enough to win without being smarter in every way or winning every conflict. Admittedly, this is a less dramatic claim.
Or it could just not care to win in the first place.
It’s true by definition that a superintelligent AI will be able to outsmart humans at some things, so I guess you are objecting to the “every way”...
“Please dont unplug me, I am about to find a cure for cancer”
MIRI has a selection of arguments for how an AI could un box itself, and they are based in the AIs knowledge of human language, values and psychology. Whether it could outsmart us in every way isnt relevant....what is relevant is whether it has those kinds of knowledge,
There are ways in which an AI outside of a box could get hold of those kinds of knowledge...but only if it is already unboxed. Otherwise it has chicken and egg problem, the problem of getting enough social engineering knowledge whilst inside the box to talk it’s way out....and it is knowledge, not the sort of thing you can figure out from first principles.
MIRI seems to think it is likely that a super AI would be preloaded with knowledge of human values because we would want it to agentively make the world a better place...in other words, the worst case scenario is very close to the best case scenario, is a near miss from the best case scenario. And the whole problem is easily .sidestepped by aiming a bit lower, eg for tool AI.
Pure information can be dangerous. Consider an AI that generates a formula for a vaccine which is supposed to protect against cancer, but actually makes everyone sterile...
If that happened I would eat my hat. Rush to push the big red STOP button, pull the hard-cutoff electrical lever, break glass on the case containing firearms & explosives, and then sit down and eat my hat.
Maybe I’m off here, but common sense tells me that if you are worried about AI takeoffs, and if you are tasking an AI with obscure technical problems like designing construction processes for first generation nanomachines, large scale data mining in support of the SENS research objectives, or plain old long-term financial projections, you don’t build in skills or knowledge that is not required. Such a machine does not need a theory of other minds. It may need to parse and understand scientific literature, but has no need to understand the social cues of persuasive language, our ethical value systems, or psychology. It certainly doesn’t need a camera pointed at me as would be required to even know I’m about to pull the plug.
Of course I’m saying the same thing you are in different words. I know you and I basically see eye-to-eye on this.
This could happen by accident. Any change to the human body has side effects. The quest for finding a clinical treatment is locating an intervention whose side effects are a net benefit, which requires at least some understanding of quality of life. It could even be a desirable outcome, vs the null result. I would gladly have a vaccine that protects against cancer but actually makes the patient sterile. Just freeze their eggs, or give it to people over 45 with family history electively.
What’s crazy is the notion that the machine lacking any knowledge about humans mentioned above could purposefully engineer such an elaborate deception to achieve a hidden purpose, all while its programmers are overseeing its goal system looking for the patterns of deceptive goal states. It’d have to be not just deceptive, but meta-deceptive. At some point the problem is just so over-constrained as to be on the level of Boltzmann-brain improbable, super-intelligence or no.
And then be reviled as the man who prevented a cure for cancer? Remember that the you in the story doesn’t have the same information as the you outside the story—he doesn’t know that the AI isnt sincere.
“Please dont unplug me, I am about to find a cure for cancer” is a .placeholder for a class of exploits on the part of the AI where it holds a carrot in front of us. It’s not going to literally come out with the cure for cancer thing under circumstances where it’s not tasked with working on something like it it, because that would be dumb , and it’s supposed to be superintelligent. But superintelligence is really difficult to predict....you have to imagine exploits, then imagine versions of them that are much better.
The hypothetical MIRI is putting forward is that if you task an super AI with agentively solving the whole of human happiness, then it will have to have the kind of social, psychological and linguistic knowledge necessary to talk its way out of the box.
A more specialised AGI seems safer… and likelier … but then another danger kicks in: it’s creators might be too relaxed about boxing it, perhaps allowing it to internet access… but the internet contains a wealth of information to bootstrap linguistic and psychological knowledge with.
There’s an important difference between rejecting MIRIs hypotheticals because the conclusions don’t follow from the antecedents, as opposed to doing so because the antecedents are unlikely in the place.
Dangers arising from non AI scenario don’t prove AI safety. My point was that an AI doesn’t need efffectors to be dangerous… information plus sloppy oversight is enough. However the MIRI scenario seems to require a kind of perfect storm of fast takeoff , overambition, poor oversight, etc.
A superintelligence can be meta deceptive. Direct inspection of code is a terrible method of oversight, since even simple AIs can work in ways that baffle human programmers.
ETA on the whole, I object to the antecedents/priors ….I think the hypothetical go through,