This argument is, however, nonsense. The human capacity for abstract reasoning over mathematical models is in principle a fully general intelligent behaviour, as the scientific revolution has shown: there is no aspect of the natural world which has remained beyond the reach of human understanding, once a sufficient amount of evidence is available. The wave-particle duality of quantum physics, or the 11-dimensional space of string theory may defy human intuition, i.e. our built-in intelligence. But we have proven ourselves perfectly capable of understanding the logical implications of models which employ them. We may not be able to build intuition for how a super-intelligence thinks. Maybe—that’s not proven either. But even if that is so, we will be able to reason about its intelligent behaviour in advance, just like string theorists are able to reason about 11-dimensional space-time without using their evolutionarily derived intuitions at all.
This may be retreating to the motte’s bailey, so to speak, but I don’t think anyone seriously thinks that a superintelligence would be literally impossible to understand. The worry is that there will be such a huge gulf between how superintelligences reason versus how we reason that it would take prohibitively long to understand them.
I think a laptop is a good example. There probably isn’t any single human on earth that knows how to build a modern laptop from scratch. There’s are computer scientists that know how the operating system is put together—how the operating system is programmed, how memory is written and retrieved from the various buses; there are other computer scientists and electrical engineers who designed the chips themselves, who arrayed circuits efficiently to dissipate heat and optimize signal latency. Even further, there are material scientists and physicists who designed the transistors and chip fabrication processes, and so on.
So, as an individual human, I don’t know what it’s like to know everything about a laptop all at once in my head, at a glance. I can zoom in on an individual piece and learn about it, but I don’t know all the nuances for each piece—just a sort of executive summary. The fundamental objects with which I can reason have a sort of characteristic size in mindspace—I can imagine 5, maybe 6 balls moving around with distinct trajectories (even then, I tend to group them into smaller subgroups). But I can’t individually imagine a hundred (I could sit down and trace out the paths of a hundred balls individually, of course, but not all at once).
This is the sense in which a superintelligence could be “dangerously” unpredictable. If the fundamental structures it uses for reasoning greatly exceed a human’s characteristic size of mindspace, it would be difficult to tease out its chain of logic. And this only gets worse the more intelligent it gets.
Now, I’ll grant you that the lesswrong community likes to sweep under the rug the great competition of timescales and “size”scales that are going on here. It might be prohibitively difficult, for fundamental reasons, to move from working-mind-RAM of size 5 to size 10. It may be that artificial intelligence research progresses so slowly that we never even see an intelligence explosion—just a gently sloped intelligence rise over the next few millennia. But I do think it’s a maybe not a mistake but certainly naiive to just proclaim, “Of course we’ll be able to understand them, we are generalized reasoners!”.
Edit: I should add that this is already a problem for, ironically, computer-assisted theorem proving. If a computer produces a 10,000,000 page “proof” of a mathematical theorem (i.e., something far longer than any human could check by hand), you’re putting a huge amount of trust in the correctness of the theorem-proving-software itself.
Edit: I should add that this is already a problem for, ironically, computer-assisted theorem proving. If a computer produces a 10,000,000 page “proof” of a mathematical theorem (i.e., something far longer than any human could check by hand), you’re putting a huge amount of trust in the correctness of the theorem-proving-software itself.
No, you just need to trust a proof-checking program, which can be quite small and simple, in contrast with the theorem proving program, which can be arbitrarily complex and obscure.
Isn’t using a laptop as a metaphor exactly an example of
Most often reasoning by analogy?
I think one of the points trying to be made was that because we have this uncertainty about how a superintelligence would work, we can’t accurately predict anything without more data.
So maybe the next step in AI should be to create an “Aquarium,” a self-contained network with no actuators and no way to access the internet, but enough processing power to support a superintelligence. We then observe what that superintelligence does in the aquarium before deciding how to resolve further uncertainties.
There is a difference between argument by analogy and using an example. The relevant difference here is that examples illustrate arguments that are made separately, like how calef spent paragraphs 4 and 5 restating the arguments sans laptop.
If anything, the argument from analogy here is in the comparison between human working memory and computer RAM and a nebulous “size in mindspace,” because it is used as an important part of the argument but is not supported separately. But don’t fall for the fallacy fallacy—just because something isn’t modus ponens doesn’t mean it can’t be Bayesian evidence.
Isn’t using a laptop as a metaphor exactly an example
The sentence could have stopped there. If someone makes a claim like “∀ x, p(x)”, it is entirely valid to disprove it via “~p(y)”, and it is not valid to complain that the first proposition is general but the second is specific.
Moving from the general to the specific myself, that laptop example is perfect. It is utterly baffling to me that people can insist we will be able to safely reason about the safety of AGI when we have yet to do so much as produce a consumer operating system that is safe from remote exploits or crashes. Are Microsoft employees uniquely incapable of “fully general intelligent behavior”? Are the OpenSSL developers especially imperfectly “capable of understanding the logical implications of models”?
If you argue that it is “nonsense” to believe that humans won’t naturally understand the complex things they devise, then that argument fails to predict the present, much less the future. If you argue that it is “nonsense” to believe that humans can’t eventually understand the complex things they devise after sufficient time and effort, then that’s more defensible, but that argument is pro-FAI-research, not anti-.
Problems with computer operating systems do not do arbitrary things in the absence of someone consciously using the exploit to make it do arbitrary things. If Windows was a metaphor for unfriendly AI, then it would be possible for AIs to halt in situations where they were intended to work, but they would only turn hostile if someone intentionally programmed them to become hostile. Unfriendly AI as discussed here is not someone intentionally programming the AI to become hostile.
Isn’t using a laptop as a metaphor exactly an example of “Most often reasoning by analogy”?
Precisely correct, thank you for catching that.
I think one of the points trying to be made was that because we have this uncertainty about how a superintelligence would work, we can’t accurately predict anything without more data.
Also correct reading of my intent. The “aquarium” ides is basically what I have and would continue to advocate for: continue developing AGI technology within the confines of a safe experimental setup. By learning more about the types of programs which can perform limited general intelligence tasks in sandbox environments, we learn more about their various strengths and limitations in context, and from that experiance we can construct suitable safeguards for larger deployments.
The worry is that there will be such a huge gulf between how superintelligences reason versus how we reason that it would take prohibitively long to understand them.
That may be a valid concern, but it requires evidence as it is not the default conclusion. Note that quantum physics is sufficiently different that human intuitions do not apply, but it does not take a physicist a “prohibitively long” time to understand quantum mechanical problems and their solutions.
As to your laptop example, I’m not sure what you are attempting to prove. Even if one single engineer doesn’t understand how ever component of a laptop works, we are nevertheless very much able to reason about the systems-level operation of laptops, or the the development trajectory of the global laptop market. When there are issues, we are able to debug them and fix them in context. If anything the example shows how humanity as a whole is able to complete complex projects like the creation of a modern computational machine without being constrained to any one individual understanding the whole.
Edit: gaaaah. Thanks Sable. I fell for the very trap of reasoning by analogy I opined against. Habitual modes of thought are hard to break.
As far as I can tell, you’re responding to the claim, “A group of humans can’t figure out complicated ideas given enough time.” But this isn’t my claim at all. My claim is, “One or many superintelligences would be difficult to predict/model/understand because they have a fundamentally more powerful way to reason about reality.” This is trivially true once the number of machines which are “smarter” than humans exceeds the total number of humans. The extent to which it is difficult to predict/model the “smarter” machines is a matter of contention. The precise number of “smarter” machines and how much “smarter” they need be before we should be “worried” is also a matter of contention. (How “worried” we should be is a matter of contention!)
But all of these points of contention are exactly the sorts of things that people at MIRI like to think about.
One or many superintelligences would be difficult to predict/model/understand because they have a fundamentally more powerful way to reason about reality.
Whatever reasoning technique is available to a super-intelligence is available to humans as well. No one is mandating that humans who build an AGI check their work with pencil and paper.
I mean, sure, but this observation (i.e., “We have tools that allow us to study the AI”) is only helpful if your reasoning techniques allow you to keep the AI in the box.
Which is, like, the entire point of contention, here (i.e., whether or not this can be done safely a priori).
I think that you think MIRI’s claim is “This cannot be done safely.” And I think your claim is “This obviously can be done safely” or perhaps “The onus is on MIRI to prove that this cannot be done safely.”
But, again, MIRI’s whole mission is to figure out the extent to which this can be done safely.
Honestly, I would love to hear your arguments against this notion.
It’s completely divorced from reality:
Once turned on, AGI will simply outsmart people in every way.
How? By what mechanism? An artificial intelligence is not a magical oracle. It arrives at its own plan of action by some deterministic algorithm running on the data available to it. An intelligence that is not programmed for social awareness will not suddenly be able to outsmart, outthink, and outmaneuver its human caretakers the moment it crosses some takeoff boundary. Without being programmed to have such capability from the start, and without doing something stupid like connecting it directly to the Internet, how is an AI supposed to develop that capability on its own without a detectable process of data collection by trial and error?
Nobody gets a free card to say “the AGI will simply outsmart people in every way.” You have to explain precisely how such capability would exist. So far, all that I’ve seen is unclear, hand-wavy arguments by analogy that are completely unsatisfactory in that regard. “Because super-intelligence!” is not an answer.
Once turned on, we can’t do anything about it.
We could, I don’t know, pull the plug.
it just does whatever it wants and can
How, unless it is given effectors in the real world? Why would we be stupid enough to do that?
and we won’t be able to control it anymore
If we started with an ability to control it, how did we lose that ability?
as [we] simply won’t be able to quickly come up decision better or even on par with it.
Turn it off. Take as long as you want to evaluate the data and make your decision. Then turn it back on again. Or not.
“Why would we be stupid enough to do that.?” For the same reason we give automatic trading software “effectors” to make trades on real world markets. For the same reason we have robot arms in factories assembling cars. For the same reason Google connects its machine learning algorithms directly to the internet. BECAUSE IT IS PROFITABLE.
People don’t want to build an AI just to keep it in a box. People want AI to do stuff for them, and in order for it to be profitable, they will want AI to do stuff faster and more effectively than a human. If it’s not worrying to you because you think people will be cautious, and not give the AI any ability to affect the world, and be instantly ready to turn off their multi-billion dollar research project, you are ASSUMING a level of caution that MIRI is trying to promote! You’ve already bought their argument, and think everyone else has!
What type of evidence would make you think it’s more likely that a self-modifying AGI could “break out of the box”?
I want to understand your resistance to thought-experiments. Are all thought-experiments detached from reality in your book? Are all analogies detached from reality? Would you ever feel like you understood something better and thus change your views because of an analogy or story? How could something like http://lesswrong.com/lw/qk/that_alien_message/ be different in a way that you would theoretically find persuasive?
Perhaps you’re saying that people’s confidence is too strong just based on analogy?
I was going to try and address your comment directly, but thought it’d be a good idea to sort that out first, because of course there are no studies of how AGI behave.
What lesson am I supposed to learn from “That Alien Message”? It’s a work of fiction. You do not generalize from fictional evidence. Maybe I should write a story about how slow a takeoff would be given the massive inefficiencies of present technology, all the trivial and mundane ways an AI in the midst of a takeoff would get tripped up and caught, and all the different ways redundant detection mechanisms, honey pots, and fail safe contraptions would prevent existential risk scenarios? But such a work of fiction would be just as invalid as evidence.
Ok, I’m still confused as to many of my questions, but let me see if this bit sounds right: the only parameter via which something like “That Alien Message” could become more persuasive to you is by being less fictional. Fictional accounts of anything will NEVER cause you to update your beliefs. Does that sound right?
If that’s right, then I want to suggest why such things should sometimes be persuasive. A perfect Bayesian reasoner with finite computational ability operates not just with uncertainty about the outside world, but also with logical uncertainty as to consequences of their beliefs. So as humans, we operate with at least that difficulty when dealing with our own beliefs. In practice we deal with much much much worse.
I believe the correct form of the deduction your trying to make is “don’t add a fictional story to the reference class of a real analogue for purposes of figuring your beliefs”, and I agree. However, there are other ways a fictional story can be persuasive and should (in my view) cause you to update your beliefs:
It illustrates a new correct deduction which you weren’t aware of before, whose consequences you then begin working out.
It reminds you of a real experience you’ve had, which was not present in your mind before, whose existence then figures into your reference classes.
It changes your emotional attitude toward something, indirectly changing your beliefs by causing you to reflect on that thing differently in the future.
Some of these are subject to biases which would need correcting to move toward better reasoning, but I perceive you as claiming that these should have no impact, ever. Am I interpreting that correctly(I’m going to guess that I’m not somewhere), and if so why do you think that?
I think it’s a pretty big assumption to assume that fictional stories typically do those things correctly. Fictional stories are, after all, produced by people with agendas. If the proportion of fictional stories with plausible but incorrect deductions, reminders, or reflections is big enough, even your ability to figure out which ones are correct might not make it worthwhile to use fiction this way.
(Consider an extreme case where you can correctly assess 95% of the time whether a fictional deduction, reminder, or reflection is correct, but they are incorrect at a 99% rate. You’d have about a 4⁄5 chance of being wrong if you update based on fiction.)
Agreed; you’d have to figure all of that out separately. For what it’s worth, given the selection of fictional stories I’m usually exposed to and decide to read, I think they’re generally positive value (though probably not the best in terms of opportunity cost.)
If a story or thought experiment prompts you to think of some existing data you hadn’t paid attention to, and realize that data was not anticipated by your present beliefs, then that data acts as evidence for updating beliefs. The story or thought experiment was merely a reference used to call attention to that data.
“Changing your emotional attitude” as far as I can tell is actually cache-flushing. It does not change your underlying beliefs, it just trains your emotional response to align with those beliefs, eliminating inconsistencies in thought.
I’m not sure where “that alien message” is supposed to lie in either of those two categories. It makes no reference to actual experimental data which I may not have been paying attention to, nor do I detect any inconsistency it is unraveling. Rather, it makes a ton of assumptions and then runs with those assumptions, when in fact those assumptions were not valid in the first place. It’s a cathedral built on sand.
Basically agreed on paragraph 1, but I do want to suggest that then we not say “I will never update on fictional stories.” Taken naively, you then might avoid fictional stories because they’re useless (“I never update on them!”), when of course they might be super useful if they cause you to pull up relevant experiences quite often.
I’ll make an example of how “That Alien Message” could do for me what I illustrated in my 1st bullet point. I think, “Oh, it seems very unlikely that an AI could break out of a box, you just have this shutoff switch and watch it closely and …”. Then That Alien Message suggests the thought experiment of “instead of generalizing over all AI, instead imagine just a highly specific type of AI that may or may not reasonably come to exist: a bunch of really sped up, smart humans.” Then it sparks my thinking for what a bunch of really sped up, smart humans could accomplish with even narrow channels of communication. Then I think “actually, though I’ve seen no new instances of the AI reference class in reality, I now reason differently about how a possible AI could behave since that class (of possibilities) includes the the thing I just thought of. Until I get a lot more information about how that class behaves in reality, I’m going to be a lot more cautious.”
By picking out a specific possible example, it illustrates that my thinking around the possible AI reference class wasn’t expansive enough. This could help break through, for example, an accessibility heuristic: when I think of a random AI, I think of my very concrete vision of how such an AI would behave, instead of really trying to think about what could lie in that huge space.
Perhaps you are already appropriately cautious, and this story sparks/sparked no new thoughts in you, or you have a good reason to believe that communities of sped up humans or anything at least as powerful are excluded from the reference space, or the reference space you care about is narrower, but it seemed like you were making a stronger statement that such stories will never have any impact on you.
Creative / clever thinking is good. It’s where new ideas come from. Practicing creative thinking by reading interesting stories is not a waste of time. Updating based on creative / clever thoughts, on the other hand, is a mistake. The one almost-exception I can think of is “X is impossible!” where a clever plan for doing X, even if not actually implemented, suffices as weak evidence that “X is impossible!” is false. Or rather, it should propagate up your belief hierarchy and make you revisit why you thought X was impossible in the first place. Because the two remaining options are: (1) you were mistaken about the plausibility of X, or (2) this clever new hypothetical is not so clever—it rests on hidden assumptions that turn out to be false. Either way you are stuck testing your own assumptions and/or the hypothesis’ assumptions before making that ruling.
The trouble is, most people tend to just assume (1). I don’t know if there is a name for this heuristic, but it does lead to bias.
Your arguments rests on trying to be clever, which Mark rejected as a means of gathering knowledge.
Do you have empiric evidence that there are cases where people did well by updating after reading fictional stories?
Are there any studies that suggest that people who update through fictional stories do better?
Studies, no. I can’t imagine studies existing today that resolve this, which of course is a huge failure of imagination: that’s a really good thing to think about. For anything high enough level, I expect to run into problems with “do better”, such as “do better at predicting the behavior of AGI” being an accessible category. I would be very excited if there were nearby categories that we could get our hands on though; I expect this is similar to the problem of developing and testing a notion of “Rationality quotient” and proving it’s effectiveness.
I’m not sure where you’re referring to with Mark rejecting cleverness as a way of gathering knowledge, but I think we may be arguing about what the human equivalent of logical uncertainty looks like? What’s the difference in this case between “cleverness” and “thinking”? (Also could you point me to the place you were talking about?)
I guess I usually think of cleverness with the negative connotation being “thinking in too much detail with a blind spot”. So could you say which part of my response you think is bad thinking, or what you instead mean by cleverness?
I see. I do think you can update based on thinking; the human analogue of the logical uncertainty I was talking about. As an aspiring mathematician, this is what I think the practice of mathematics looks, for instance.
I understand the objection that this process may fail in real life, or lead to worse outcomes, since our models aren’t purely formal and our reasoning isn’t either purely deductive or optimally Bayesian. It looks like some others have made some great comments to this article also discussing that.
I’m just confused about which thinking you’re considering bad. I’m sure I’m not understanding, because it sounds to me like “the thinking which is thinking, and not direct empirical observation.” There’s got to be some level above direct empirical observation, or you’re just an observational rock. The internal process you have which at any level approximates Bayesian reasoning is a combination of your unconscious processing and your conscious thinking.
I’m used to people picking apart arguments. I’m used to heuristics that say, “hey you’ve gone too far with abstract thinking here, and here’s an empirical way to settle it, or here’s an argument for why your abstract thinking has gone too far and you should wait for empirical evidence or do X to seek some out” But I’m not used to “your mistake was abstract thinking at all; you can do nothing but empirically observe to gain a new state of understanding”, at least with regard to things like this. I feel like I’m caricaturing, but there’s a big blank when I try to figure out what else is being said.
There are two ways you can to do reasoning.
1) You build a theory about Bayesian updating and how it should work.
2) You run studies of how humans reasons and when they reason successfully. You identify when and how human reason correctly.
If I would argue that taking a specific drug helps you with illness X, the only argument you would likely accept is an empiric study. That’s independent with whether or not you can find a flaw in casual reason of why I think drug X should help with an illness. At least if you believe in evidence-based medicine.
The reason is that in the past theory based arguments often turned out to be wrong in the field of medicine.
We don’t live in a time where we don’t have anyone doing decision science. Whether or not people are simply blinded by fiction or whether it helps reasoning is an empirical question.
Ok, I think see the core of what you’re talking about, especially “Whether or not people are simply blinded by fiction or whether it helps reasoning is an empirical question.” This sounds like an outside view versus inside view distinction: I’ve been focused on “What should my inside view look like” and using outside view tools to modify that when possible (such as knowledge of a bias from decision science.) I think you and maybe Mark are trying to say “the inside view is useless or counter-productive here; only the outside view will be of any use” so that in the absence of outside view evidence, we should simply not attempt to reason further unless it’s a super-clear case, like Mark illustrates in his other comment.
My intuition is that this is incorrect, but it reminds me of the Hanson-Yudkowsky debates on outside vs. weak inside view, and I think I don’t have a strong enough grasp to clarify my intuition sufficiently right now. I’m going to try and pay serious attention to this issue in the future though, and would appreciate if you have any references that you think might clarify.
It’s not only outside vs. inside view. It’s knowing things is really hard. Humans are by nature overconfident. Life isn’t fair. The fact that empiric evidence is hard to get doesn’t make theoretical reasoning about the issue any more likely to be correct.
I rather trust a doctor with medical experience (has an inside view) to translate empirical studies in a way that applies directly to me than someone who reasons simply based on reading the study and who has no medical experience.
I do sin from time from time and act overconfident. But that doesn’t mean it’s right. Skepticism is a virtue. I like Foersters book “Truth is the invention of a liar” (unfortunately that book is in German, and I haven’t read other writing by him).
It doesn’t really gives answers but it makes the unknowing more graspable.
It’d be invalid as evidence, but it might still give a felt sense of your ideas that helps appreciate them. Discussions of AI, like discussions of aliens, have always been drawing on fiction at least for illustration. I for one would love to see that story.
I t occurs to me that an AI could be smart enough to win without being smarter in every way or winning every conflict. Admittedly, this is a less dramatic claim.
Once turned on, AGI will simply outsmart people in every way.
How? By what mechanism? An artificial intelligence is not a magical oracle
It’s true by definition that a superintelligent AI will be able to outsmart humans at some things, so I guess you are objecting to the “every way”...
Once turned on, we can’t do anything about it.
We could, I don’t know, pull the plug.
“Please dont unplug me, I am about to find a cure for cancer”
MIRI has a selection of arguments for how an AI could un box itself, and they are based in the AIs knowledge of human language, values and psychology. Whether it could outsmart us in every way isnt relevant....what is relevant is whether it has those kinds of knowledge,
There are ways in which an AI outside of a box could get hold of those kinds of knowledge...but only if it is already unboxed. Otherwise it has chicken and egg problem, the problem of getting enough social engineering knowledge whilst inside the box to talk it’s way out....and it is knowledge, not the sort of thing you can figure out from first principles.
MIRI seems to think it is likely that a super AI would be preloaded with knowledge of human values because we would want it to agentively make the world a better place...in other words, the worst case scenario is very close to the best case scenario, is a near miss from the best case scenario. And the whole problem is easily .sidestepped by aiming a bit lower, eg for tool AI.
just does whatever it wants and canHow, unless it is given effectors in the real world?
Pure information can be dangerous. Consider an AI that generates a formula for a vaccine which is supposed to protect against cancer, but actually makes everyone sterile...
“Please dont unplug me, I am about to find a cure for cancer”
If that happened I would eat my hat. Rush to push the big red STOP button, pull the hard-cutoff electrical lever, break glass on the case containing firearms & explosives, and then sit down and eat my hat.
Maybe I’m off here, but common sense tells me that if you are worried about AI takeoffs, and if you are tasking an AI with obscure technical problems like designing construction processes for first generation nanomachines, large scale data mining in support of the SENS research objectives, or plain old long-term financial projections, you don’t build in skills or knowledge that is not required. Such a machine does not need a theory of other minds. It may need to parse and understand scientific literature, but has no need to understand the social cues of persuasive language, our ethical value systems, or psychology. It certainly doesn’t need a camera pointed at me as would be required to even know I’m about to pull the plug.
Of course I’m saying the same thing you are in different words. I know you and I basically see eye-to-eye on this.
Pure information can be dangerous. Consider an AI that generates a formula for a vaccine which is supposed to protect against cancer, but actually makes everyone sterile...
This could happen by accident. Any change to the human body has side effects. The quest for finding a clinical treatment is locating an intervention whose side effects are a net benefit, which requires at least some understanding of quality of life. It could even be a desirable outcome, vs the null result. I would gladly have a vaccine that protects against cancer but actually makes the patient sterile. Just freeze their eggs, or give it to people over 45 with family history electively.
What’s crazy is the notion that the machine lacking any knowledge about humans mentioned above could purposefully engineer such an elaborate deception to achieve a hidden purpose, all while its programmers are overseeing its goal system looking for the patterns of deceptive goal states. It’d have to be not just deceptive, but meta-deceptive. At some point the problem is just so over-constrained as to be on the level of Boltzmann-brain improbable, super-intelligence or no.
“Please dont unplug me, I am about to find a cure for cancer”I
f that happened I would eat my hat. Rush to push the big red STOP button, pull the yhard-cutoff electrical lever, break glass on the case containing firearms & explosives, and then sit down and eat my hat.
And then be reviled as the man who prevented a cure for cancer? Remember that the you in the story doesn’t have the same information as the you outside the story—he doesn’t know that the AI isnt sincere.
“Please dont unplug me, I am about to find a cure for cancer” is a .placeholder for a class of exploits on the part of the AI where it holds a carrot in front of us. It’s not going to literally come out with the cure for cancer thing under circumstances where it’s not tasked with working on something like it it, because that would be dumb , and it’s supposed to be superintelligent. But superintelligence is really difficult to predict....you have to imagine exploits, then imagine versions of them that are much better.
Maybe I’m off here, but common sense tells me that if you are worried about AI takeoffs, and if you are tasking an AI with obscure technical problems like designing construction processes for first generation nanomachines, large scale data mining in support of the SENS research objectives, or plain old long-term financial projections, you don’t build in skills or knowledge that is not required.
The hypothetical MIRI is putting forward is that if you task an super AI with agentively solving the whole of human happiness, then it will have to have the kind of social, psychological and linguistic knowledge necessary to talk its way out of the box.
A more specialised AGI seems safer… and likelier … but then another danger kicks in: it’s creators might be too relaxed about boxing it, perhaps allowing it to internet access… but the internet contains a wealth of information to bootstrap linguistic and psychological knowledge with.
There’s an important difference between rejecting MIRIs hypotheticals because the conclusions don’t follow from the antecedents, as opposed to doing so because the antecedents are unlikely in the place.
This could happen accident.
Dangers arising from non AI scenario don’t prove AI safety. My point was that an AI doesn’t need efffectors to be dangerous… information plus sloppy oversight is enough. However the MIRI scenario seems to require a kind of perfect storm of fast takeoff , overambition, poor oversight, etc.
all while its programmers are overseeing its goal system looking for the patterns of deceptive goal states. It’d have to be not just deceptive, but meta-deceptive.
A superintelligence can be meta deceptive. Direct inspection of code is a terrible method of oversight, since even simple AIs can work in ways that baffle human programmers.
ETA on the whole, I object to the antecedents/priors ….I think the hypothetical go through,
This may be retreating to the motte’s bailey, so to speak, but I don’t think anyone seriously thinks that a superintelligence would be literally impossible to understand. The worry is that there will be such a huge gulf between how superintelligences reason versus how we reason that it would take prohibitively long to understand them.
I think a laptop is a good example. There probably isn’t any single human on earth that knows how to build a modern laptop from scratch. There’s are computer scientists that know how the operating system is put together—how the operating system is programmed, how memory is written and retrieved from the various buses; there are other computer scientists and electrical engineers who designed the chips themselves, who arrayed circuits efficiently to dissipate heat and optimize signal latency. Even further, there are material scientists and physicists who designed the transistors and chip fabrication processes, and so on.
So, as an individual human, I don’t know what it’s like to know everything about a laptop all at once in my head, at a glance. I can zoom in on an individual piece and learn about it, but I don’t know all the nuances for each piece—just a sort of executive summary. The fundamental objects with which I can reason have a sort of characteristic size in mindspace—I can imagine 5, maybe 6 balls moving around with distinct trajectories (even then, I tend to group them into smaller subgroups). But I can’t individually imagine a hundred (I could sit down and trace out the paths of a hundred balls individually, of course, but not all at once).
This is the sense in which a superintelligence could be “dangerously” unpredictable. If the fundamental structures it uses for reasoning greatly exceed a human’s characteristic size of mindspace, it would be difficult to tease out its chain of logic. And this only gets worse the more intelligent it gets.
Now, I’ll grant you that the lesswrong community likes to sweep under the rug the great competition of timescales and “size”scales that are going on here. It might be prohibitively difficult, for fundamental reasons, to move from working-mind-RAM of size 5 to size 10. It may be that artificial intelligence research progresses so slowly that we never even see an intelligence explosion—just a gently sloped intelligence rise over the next few millennia. But I do think it’s a maybe not a mistake but certainly naiive to just proclaim, “Of course we’ll be able to understand them, we are generalized reasoners!”.
Edit: I should add that this is already a problem for, ironically, computer-assisted theorem proving. If a computer produces a 10,000,000 page “proof” of a mathematical theorem (i.e., something far longer than any human could check by hand), you’re putting a huge amount of trust in the correctness of the theorem-proving-software itself.
No, you just need to trust a proof-checking program, which can be quite small and simple, in contrast with the theorem proving program, which can be arbitrarily complex and obscure.
Isn’t using a laptop as a metaphor exactly an example of
I think one of the points trying to be made was that because we have this uncertainty about how a superintelligence would work, we can’t accurately predict anything without more data.
So maybe the next step in AI should be to create an “Aquarium,” a self-contained network with no actuators and no way to access the internet, but enough processing power to support a superintelligence. We then observe what that superintelligence does in the aquarium before deciding how to resolve further uncertainties.
There is a difference between argument by analogy and using an example. The relevant difference here is that examples illustrate arguments that are made separately, like how calef spent paragraphs 4 and 5 restating the arguments sans laptop.
If anything, the argument from analogy here is in the comparison between human working memory and computer RAM and a nebulous “size in mindspace,” because it is used as an important part of the argument but is not supported separately. But don’t fall for the fallacy fallacy—just because something isn’t modus ponens doesn’t mean it can’t be Bayesian evidence.
The sentence could have stopped there. If someone makes a claim like “∀ x, p(x)”, it is entirely valid to disprove it via “~p(y)”, and it is not valid to complain that the first proposition is general but the second is specific.
Moving from the general to the specific myself, that laptop example is perfect. It is utterly baffling to me that people can insist we will be able to safely reason about the safety of AGI when we have yet to do so much as produce a consumer operating system that is safe from remote exploits or crashes. Are Microsoft employees uniquely incapable of “fully general intelligent behavior”? Are the OpenSSL developers especially imperfectly “capable of understanding the logical implications of models”?
If you argue that it is “nonsense” to believe that humans won’t naturally understand the complex things they devise, then that argument fails to predict the present, much less the future. If you argue that it is “nonsense” to believe that humans can’t eventually understand the complex things they devise after sufficient time and effort, then that’s more defensible, but that argument is pro-FAI-research, not anti-.
Problems with computer operating systems do not do arbitrary things in the absence of someone consciously using the exploit to make it do arbitrary things. If Windows was a metaphor for unfriendly AI, then it would be possible for AIs to halt in situations where they were intended to work, but they would only turn hostile if someone intentionally programmed them to become hostile. Unfriendly AI as discussed here is not someone intentionally programming the AI to become hostile.
Precisely correct, thank you for catching that.
Also correct reading of my intent. The “aquarium” ides is basically what I have and would continue to advocate for: continue developing AGI technology within the confines of a safe experimental setup. By learning more about the types of programs which can perform limited general intelligence tasks in sandbox environments, we learn more about their various strengths and limitations in context, and from that experiance we can construct suitable safeguards for larger deployments.
That may be a valid concern, but it requires evidence as it is not the default conclusion. Note that quantum physics is sufficiently different that human intuitions do not apply, but it does not take a physicist a “prohibitively long” time to understand quantum mechanical problems and their solutions.
As to your laptop example, I’m not sure what you are attempting to prove. Even if one single engineer doesn’t understand how ever component of a laptop works, we are nevertheless very much able to reason about the systems-level operation of laptops, or the the development trajectory of the global laptop market. When there are issues, we are able to debug them and fix them in context. If anything the example shows how humanity as a whole is able to complete complex projects like the creation of a modern computational machine without being constrained to any one individual understanding the whole.
Edit: gaaaah. Thanks Sable. I fell for the very trap of reasoning by analogy I opined against. Habitual modes of thought are hard to break.
As far as I can tell, you’re responding to the claim, “A group of humans can’t figure out complicated ideas given enough time.” But this isn’t my claim at all. My claim is, “One or many superintelligences would be difficult to predict/model/understand because they have a fundamentally more powerful way to reason about reality.” This is trivially true once the number of machines which are “smarter” than humans exceeds the total number of humans. The extent to which it is difficult to predict/model the “smarter” machines is a matter of contention. The precise number of “smarter” machines and how much “smarter” they need be before we should be “worried” is also a matter of contention. (How “worried” we should be is a matter of contention!)
But all of these points of contention are exactly the sorts of things that people at MIRI like to think about.
Whatever reasoning technique is available to a super-intelligence is available to humans as well. No one is mandating that humans who build an AGI check their work with pencil and paper.
I mean, sure, but this observation (i.e., “We have tools that allow us to study the AI”) is only helpful if your reasoning techniques allow you to keep the AI in the box.
Which is, like, the entire point of contention, here (i.e., whether or not this can be done safely a priori).
I think that you think MIRI’s claim is “This cannot be done safely.” And I think your claim is “This obviously can be done safely” or perhaps “The onus is on MIRI to prove that this cannot be done safely.”
But, again, MIRI’s whole mission is to figure out the extent to which this can be done safely.
It’s completely divorced from reality:
How? By what mechanism? An artificial intelligence is not a magical oracle. It arrives at its own plan of action by some deterministic algorithm running on the data available to it. An intelligence that is not programmed for social awareness will not suddenly be able to outsmart, outthink, and outmaneuver its human caretakers the moment it crosses some takeoff boundary. Without being programmed to have such capability from the start, and without doing something stupid like connecting it directly to the Internet, how is an AI supposed to develop that capability on its own without a detectable process of data collection by trial and error?
Nobody gets a free card to say “the AGI will simply outsmart people in every way.” You have to explain precisely how such capability would exist. So far, all that I’ve seen is unclear, hand-wavy arguments by analogy that are completely unsatisfactory in that regard. “Because super-intelligence!” is not an answer.
We could, I don’t know, pull the plug.
How, unless it is given effectors in the real world? Why would we be stupid enough to do that?
If we started with an ability to control it, how did we lose that ability?
Turn it off. Take as long as you want to evaluate the data and make your decision. Then turn it back on again. Or not.
“Why would we be stupid enough to do that.?” For the same reason we give automatic trading software “effectors” to make trades on real world markets. For the same reason we have robot arms in factories assembling cars. For the same reason Google connects its machine learning algorithms directly to the internet. BECAUSE IT IS PROFITABLE.
People don’t want to build an AI just to keep it in a box. People want AI to do stuff for them, and in order for it to be profitable, they will want AI to do stuff faster and more effectively than a human. If it’s not worrying to you because you think people will be cautious, and not give the AI any ability to affect the world, and be instantly ready to turn off their multi-billion dollar research project, you are ASSUMING a level of caution that MIRI is trying to promote! You’ve already bought their argument, and think everyone else has!
What type of evidence would make you think it’s more likely that a self-modifying AGI could “break out of the box”?
I want to understand your resistance to thought-experiments. Are all thought-experiments detached from reality in your book? Are all analogies detached from reality? Would you ever feel like you understood something better and thus change your views because of an analogy or story? How could something like http://lesswrong.com/lw/qk/that_alien_message/ be different in a way that you would theoretically find persuasive?
Perhaps you’re saying that people’s confidence is too strong just based on analogy?
I was going to try and address your comment directly, but thought it’d be a good idea to sort that out first, because of course there are no studies of how AGI behave.
What lesson am I supposed to learn from “That Alien Message”? It’s a work of fiction. You do not generalize from fictional evidence. Maybe I should write a story about how slow a takeoff would be given the massive inefficiencies of present technology, all the trivial and mundane ways an AI in the midst of a takeoff would get tripped up and caught, and all the different ways redundant detection mechanisms, honey pots, and fail safe contraptions would prevent existential risk scenarios? But such a work of fiction would be just as invalid as evidence.
Ok, I’m still confused as to many of my questions, but let me see if this bit sounds right: the only parameter via which something like “That Alien Message” could become more persuasive to you is by being less fictional. Fictional accounts of anything will NEVER cause you to update your beliefs. Does that sound right?
If that’s right, then I want to suggest why such things should sometimes be persuasive. A perfect Bayesian reasoner with finite computational ability operates not just with uncertainty about the outside world, but also with logical uncertainty as to consequences of their beliefs. So as humans, we operate with at least that difficulty when dealing with our own beliefs. In practice we deal with much much much worse.
I believe the correct form of the deduction your trying to make is “don’t add a fictional story to the reference class of a real analogue for purposes of figuring your beliefs”, and I agree. However, there are other ways a fictional story can be persuasive and should (in my view) cause you to update your beliefs:
It illustrates a new correct deduction which you weren’t aware of before, whose consequences you then begin working out.
It reminds you of a real experience you’ve had, which was not present in your mind before, whose existence then figures into your reference classes.
It changes your emotional attitude toward something, indirectly changing your beliefs by causing you to reflect on that thing differently in the future.
Some of these are subject to biases which would need correcting to move toward better reasoning, but I perceive you as claiming that these should have no impact, ever. Am I interpreting that correctly(I’m going to guess that I’m not somewhere), and if so why do you think that?
I think it’s a pretty big assumption to assume that fictional stories typically do those things correctly. Fictional stories are, after all, produced by people with agendas. If the proportion of fictional stories with plausible but incorrect deductions, reminders, or reflections is big enough, even your ability to figure out which ones are correct might not make it worthwhile to use fiction this way.
(Consider an extreme case where you can correctly assess 95% of the time whether a fictional deduction, reminder, or reflection is correct, but they are incorrect at a 99% rate. You’d have about a 4⁄5 chance of being wrong if you update based on fiction.)
Agreed; you’d have to figure all of that out separately. For what it’s worth, given the selection of fictional stories I’m usually exposed to and decide to read, I think they’re generally positive value (though probably not the best in terms of opportunity cost.)
If a story or thought experiment prompts you to think of some existing data you hadn’t paid attention to, and realize that data was not anticipated by your present beliefs, then that data acts as evidence for updating beliefs. The story or thought experiment was merely a reference used to call attention to that data.
“Changing your emotional attitude” as far as I can tell is actually cache-flushing. It does not change your underlying beliefs, it just trains your emotional response to align with those beliefs, eliminating inconsistencies in thought.
I’m not sure where “that alien message” is supposed to lie in either of those two categories. It makes no reference to actual experimental data which I may not have been paying attention to, nor do I detect any inconsistency it is unraveling. Rather, it makes a ton of assumptions and then runs with those assumptions, when in fact those assumptions were not valid in the first place. It’s a cathedral built on sand.
Basically agreed on paragraph 1, but I do want to suggest that then we not say “I will never update on fictional stories.” Taken naively, you then might avoid fictional stories because they’re useless (“I never update on them!”), when of course they might be super useful if they cause you to pull up relevant experiences quite often.
I’ll make an example of how “That Alien Message” could do for me what I illustrated in my 1st bullet point. I think, “Oh, it seems very unlikely that an AI could break out of a box, you just have this shutoff switch and watch it closely and …”. Then That Alien Message suggests the thought experiment of “instead of generalizing over all AI, instead imagine just a highly specific type of AI that may or may not reasonably come to exist: a bunch of really sped up, smart humans.” Then it sparks my thinking for what a bunch of really sped up, smart humans could accomplish with even narrow channels of communication. Then I think “actually, though I’ve seen no new instances of the AI reference class in reality, I now reason differently about how a possible AI could behave since that class (of possibilities) includes the the thing I just thought of. Until I get a lot more information about how that class behaves in reality, I’m going to be a lot more cautious.”
By picking out a specific possible example, it illustrates that my thinking around the possible AI reference class wasn’t expansive enough. This could help break through, for example, an accessibility heuristic: when I think of a random AI, I think of my very concrete vision of how such an AI would behave, instead of really trying to think about what could lie in that huge space.
Perhaps you are already appropriately cautious, and this story sparks/sparked no new thoughts in you, or you have a good reason to believe that communities of sped up humans or anything at least as powerful are excluded from the reference space, or the reference space you care about is narrower, but it seemed like you were making a stronger statement that such stories will never have any impact on you.
Creative / clever thinking is good. It’s where new ideas come from. Practicing creative thinking by reading interesting stories is not a waste of time. Updating based on creative / clever thoughts, on the other hand, is a mistake. The one almost-exception I can think of is “X is impossible!” where a clever plan for doing X, even if not actually implemented, suffices as weak evidence that “X is impossible!” is false. Or rather, it should propagate up your belief hierarchy and make you revisit why you thought X was impossible in the first place. Because the two remaining options are: (1) you were mistaken about the plausibility of X, or (2) this clever new hypothetical is not so clever—it rests on hidden assumptions that turn out to be false. Either way you are stuck testing your own assumptions and/or the hypothesis’ assumptions before making that ruling.
The trouble is, most people tend to just assume (1). I don’t know if there is a name for this heuristic, but it does lead to bias.
Your arguments rests on trying to be clever, which Mark rejected as a means of gathering knowledge.
Do you have empiric evidence that there are cases where people did well by updating after reading fictional stories? Are there any studies that suggest that people who update through fictional stories do better?
This seems promising!
Studies, no. I can’t imagine studies existing today that resolve this, which of course is a huge failure of imagination: that’s a really good thing to think about. For anything high enough level, I expect to run into problems with “do better”, such as “do better at predicting the behavior of AGI” being an accessible category. I would be very excited if there were nearby categories that we could get our hands on though; I expect this is similar to the problem of developing and testing a notion of “Rationality quotient” and proving it’s effectiveness.
I’m not sure where you’re referring to with Mark rejecting cleverness as a way of gathering knowledge, but I think we may be arguing about what the human equivalent of logical uncertainty looks like? What’s the difference in this case between “cleverness” and “thinking”? (Also could you point me to the place you were talking about?)
I guess I usually think of cleverness with the negative connotation being “thinking in too much detail with a blind spot”. So could you say which part of my response you think is bad thinking, or what you instead mean by cleverness?
It’s detached from empirical observation. It rests on the assumption that one can gather knowledge by reasoning itself (i.e. being clever).
I see. I do think you can update based on thinking; the human analogue of the logical uncertainty I was talking about. As an aspiring mathematician, this is what I think the practice of mathematics looks, for instance.
I understand the objection that this process may fail in real life, or lead to worse outcomes, since our models aren’t purely formal and our reasoning isn’t either purely deductive or optimally Bayesian. It looks like some others have made some great comments to this article also discussing that.
I’m just confused about which thinking you’re considering bad. I’m sure I’m not understanding, because it sounds to me like “the thinking which is thinking, and not direct empirical observation.” There’s got to be some level above direct empirical observation, or you’re just an observational rock. The internal process you have which at any level approximates Bayesian reasoning is a combination of your unconscious processing and your conscious thinking.
I’m used to people picking apart arguments. I’m used to heuristics that say, “hey you’ve gone too far with abstract thinking here, and here’s an empirical way to settle it, or here’s an argument for why your abstract thinking has gone too far and you should wait for empirical evidence or do X to seek some out” But I’m not used to “your mistake was abstract thinking at all; you can do nothing but empirically observe to gain a new state of understanding”, at least with regard to things like this. I feel like I’m caricaturing, but there’s a big blank when I try to figure out what else is being said.
There are two ways you can to do reasoning. 1) You build a theory about Bayesian updating and how it should work. 2) You run studies of how humans reasons and when they reason successfully. You identify when and how human reason correctly.
If I would argue that taking a specific drug helps you with illness X, the only argument you would likely accept is an empiric study. That’s independent with whether or not you can find a flaw in casual reason of why I think drug X should help with an illness. At least if you believe in evidence-based medicine. The reason is that in the past theory based arguments often turned out to be wrong in the field of medicine.
We don’t live in a time where we don’t have anyone doing decision science. Whether or not people are simply blinded by fiction or whether it helps reasoning is an empirical question.
Ok, I think see the core of what you’re talking about, especially “Whether or not people are simply blinded by fiction or whether it helps reasoning is an empirical question.” This sounds like an outside view versus inside view distinction: I’ve been focused on “What should my inside view look like” and using outside view tools to modify that when possible (such as knowledge of a bias from decision science.) I think you and maybe Mark are trying to say “the inside view is useless or counter-productive here; only the outside view will be of any use” so that in the absence of outside view evidence, we should simply not attempt to reason further unless it’s a super-clear case, like Mark illustrates in his other comment.
My intuition is that this is incorrect, but it reminds me of the Hanson-Yudkowsky debates on outside vs. weak inside view, and I think I don’t have a strong enough grasp to clarify my intuition sufficiently right now. I’m going to try and pay serious attention to this issue in the future though, and would appreciate if you have any references that you think might clarify.
It’s not only outside vs. inside view. It’s knowing things is really hard. Humans are by nature overconfident. Life isn’t fair. The fact that empiric evidence is hard to get doesn’t make theoretical reasoning about the issue any more likely to be correct.
I rather trust a doctor with medical experience (has an inside view) to translate empirical studies in a way that applies directly to me than someone who reasons simply based on reading the study and who has no medical experience.
I do sin from time from time and act overconfident. But that doesn’t mean it’s right. Skepticism is a virtue. I like Foersters book “Truth is the invention of a liar” (unfortunately that book is in German, and I haven’t read other writing by him). It doesn’t really gives answers but it makes the unknowing more graspable.
It’d be invalid as evidence, but it might still give a felt sense of your ideas that helps appreciate them. Discussions of AI, like discussions of aliens, have always been drawing on fiction at least for illustration. I for one would love to see that story.
I t occurs to me that an AI could be smart enough to win without being smarter in every way or winning every conflict. Admittedly, this is a less dramatic claim.
Or it could just not care to win in the first place.
It’s true by definition that a superintelligent AI will be able to outsmart humans at some things, so I guess you are objecting to the “every way”...
“Please dont unplug me, I am about to find a cure for cancer”
MIRI has a selection of arguments for how an AI could un box itself, and they are based in the AIs knowledge of human language, values and psychology. Whether it could outsmart us in every way isnt relevant....what is relevant is whether it has those kinds of knowledge,
There are ways in which an AI outside of a box could get hold of those kinds of knowledge...but only if it is already unboxed. Otherwise it has chicken and egg problem, the problem of getting enough social engineering knowledge whilst inside the box to talk it’s way out....and it is knowledge, not the sort of thing you can figure out from first principles.
MIRI seems to think it is likely that a super AI would be preloaded with knowledge of human values because we would want it to agentively make the world a better place...in other words, the worst case scenario is very close to the best case scenario, is a near miss from the best case scenario. And the whole problem is easily .sidestepped by aiming a bit lower, eg for tool AI.
Pure information can be dangerous. Consider an AI that generates a formula for a vaccine which is supposed to protect against cancer, but actually makes everyone sterile...
If that happened I would eat my hat. Rush to push the big red STOP button, pull the hard-cutoff electrical lever, break glass on the case containing firearms & explosives, and then sit down and eat my hat.
Maybe I’m off here, but common sense tells me that if you are worried about AI takeoffs, and if you are tasking an AI with obscure technical problems like designing construction processes for first generation nanomachines, large scale data mining in support of the SENS research objectives, or plain old long-term financial projections, you don’t build in skills or knowledge that is not required. Such a machine does not need a theory of other minds. It may need to parse and understand scientific literature, but has no need to understand the social cues of persuasive language, our ethical value systems, or psychology. It certainly doesn’t need a camera pointed at me as would be required to even know I’m about to pull the plug.
Of course I’m saying the same thing you are in different words. I know you and I basically see eye-to-eye on this.
This could happen by accident. Any change to the human body has side effects. The quest for finding a clinical treatment is locating an intervention whose side effects are a net benefit, which requires at least some understanding of quality of life. It could even be a desirable outcome, vs the null result. I would gladly have a vaccine that protects against cancer but actually makes the patient sterile. Just freeze their eggs, or give it to people over 45 with family history electively.
What’s crazy is the notion that the machine lacking any knowledge about humans mentioned above could purposefully engineer such an elaborate deception to achieve a hidden purpose, all while its programmers are overseeing its goal system looking for the patterns of deceptive goal states. It’d have to be not just deceptive, but meta-deceptive. At some point the problem is just so over-constrained as to be on the level of Boltzmann-brain improbable, super-intelligence or no.
And then be reviled as the man who prevented a cure for cancer? Remember that the you in the story doesn’t have the same information as the you outside the story—he doesn’t know that the AI isnt sincere.
“Please dont unplug me, I am about to find a cure for cancer” is a .placeholder for a class of exploits on the part of the AI where it holds a carrot in front of us. It’s not going to literally come out with the cure for cancer thing under circumstances where it’s not tasked with working on something like it it, because that would be dumb , and it’s supposed to be superintelligent. But superintelligence is really difficult to predict....you have to imagine exploits, then imagine versions of them that are much better.
The hypothetical MIRI is putting forward is that if you task an super AI with agentively solving the whole of human happiness, then it will have to have the kind of social, psychological and linguistic knowledge necessary to talk its way out of the box.
A more specialised AGI seems safer… and likelier … but then another danger kicks in: it’s creators might be too relaxed about boxing it, perhaps allowing it to internet access… but the internet contains a wealth of information to bootstrap linguistic and psychological knowledge with.
There’s an important difference between rejecting MIRIs hypotheticals because the conclusions don’t follow from the antecedents, as opposed to doing so because the antecedents are unlikely in the place.
Dangers arising from non AI scenario don’t prove AI safety. My point was that an AI doesn’t need efffectors to be dangerous… information plus sloppy oversight is enough. However the MIRI scenario seems to require a kind of perfect storm of fast takeoff , overambition, poor oversight, etc.
A superintelligence can be meta deceptive. Direct inspection of code is a terrible method of oversight, since even simple AIs can work in ways that baffle human programmers.
ETA on the whole, I object to the antecedents/priors ….I think the hypothetical go through,