I’ve had several conversations that went like this:
Victim: But surely a smart artificial intelligence will be able to tell right from wrong, if we humans can do that?
Me: Forget about the word “intelligence” for a moment. Imagine a machine that looks at all actions in turn, and mechanically chooses the action that leads to producing the greatest number of paperclips, in whichever way possible. With enough computing power and enough knowledge about the outside world, the machine might find a way to convert the whole world into a paperclip factory. The machine will resist any attempts by humans to interfere, because the machine’s goal function doesn’t say anything about humans, only paperclips.
Victim: But such a machine would not be truly intelligent.
Me: Who cares about definitions of words? Humanity can someday find a way to build such a machine, and then we’re all screwed.
Victim: …okay, I see your point. Your machine is not intelligent, but it can be very dangerous because it’s super-efficient.
Me (under my breath): Yeah. That’s actually my definition of “superintelligent”, but you seem to have a concept of “intelligence” that’s entangled with many accidental facts about humans, so let’s not go there.
That’s ‘trying to find a name to use that isn’t as loaded with muddled connotations than AI is’. Ciphergoth doesn’t actually conclude anything. He puts forward a concept that potentially future conclusions could be made (or assumed) about.
The concept is itself a conclusion. Ciphergoth puts forth the concept without supporting arguments. Thus he assumes the conclusion. Now, maybe it’s useful to say, “hey, we’ve already derived a cool name for our conclusion: ‘really powerful optimization process’”, and that’s what ciphergoth is doing; but the conclusion is not convincingly argued for anywhere (the arguments are mostly assumptions of the conclusion), and so putting it forth without new arguments is assuming the conclusion.
If the question is, “Is the Moon made of Gouda?”, and someone puts forth the argument that “The Moon is almost certainly made of Gouda”, how is that not assuming the conclusion? Proposing calling ‘A[G]I’ (notice the equivocation on AI and AGI) a “really powerful optimization process” is like saying “we shouldn’t call it ‘Moon’, ‘Moon’ is too vague; we should call it Giant Heavenly Gouda Ball”. How is that not assuming the conclusion? Especially when the arguments for naming it ‘Giant Heavenly Gouda Ball’ amount to “we all agree it’s Giant, Heavenly, and a Ball, and it’s intuitively obvious that even though the Earth isn’t made of Gouda and we’ve never actually been to the ‘Moon’, the ‘Moon’ is almost certainly made of Gouda”.
Repeatedly bringing up the conclusion in different words as if doing so constituted an argument is actively harmful. This stresses me out a lot, and that’s why I’m being destructive. Even if reasserting the conclusion as an argument is not what ciphergoth intended, you must know that’s how it will be taken by the majority of LessWrongers and, more importantly, third parties who will be introduced to AI risk by those LessWrongers, e.g. Stuart.
Nonetheless, I am receptive to your claim of destruction, and will try to adjust my actions accordingly.
I would agree that introducing a concept has connotations of considering the hypothesis that an instantiation of the concept exists or is possible, and without sufficient evidence to support the complexity of the concept, this is privileging the hypothesis, which is close enough to assuming the conclusion. However, it seems weird to make this criticism of “outcome pump” and “really powerful optimization process” and not make the same criticism of “artificial intelligence” when the former are attempts to avoid bad assumptions from connotations of the latter. When “intelligence” makes people think it must be human like, this makes “powerful artificial intelligence” a strictly more specific concept than “powerful optimization process”.
I was under the impression that “artificial intelligence” is meant to differentiate human and machine “intelligence” along technical lines, not moral ones, i.e., to emphasize that they solve problems in technically different ways. “Outcome pump” and “really powerful optimization process” are meant to differentiate human and non-human “intelligence” along moral lines; the justification for this distinction is much less clear-cut. I don’t criticize “artificial intelligence” much because it’s empirically and theoretically demonstrable that humans and machines solve problems in technically different ways, which distinction I thought was the purpose of the term to make.
True, it only differentiates by connotational reference to standard SingInst arguments. Outside of that context it might be a useful phrase. It’s just that the only reason people use the term is because they allege that “AI” has certain unfortunate connotations, but their arguments for why those connotations are unfortunate are hidden and inconclusive, and so suggesting “really powerful optimization process” instead of “AI” seems to an impartial observer like a sneaky and un-called-for attempt to shift the burden of proof and the frame of the debate. I’m too steeped in SingInst arguments and terminology to know if that’s how it will come across to outsiders; my fear that it’ll come across that way might be excessive.
I can’t speak for others here, but one reason I’ve taken to talking about optimizers rather than intelligences in many cases is because while I’m fairly confident that all intelligences are optimizers, I’m not sure that all optimizers are intelligences, and many of the things I intuitively want to say about “intelligences” it turns out, on consideration, I actually believe about optimizers. (In other cases I in fact turn out to believe them about intelligences, and I use that word in those case.)
Yup, I’m not sure I do either. It’s clear to me that “intelligence” has connotations that “optimizer” lacks, with the result that the former label in practice refers to a subset of the latter, but connotations are notoriously difficult to pin down precisely. One approximation is that “intelligent” is often strongly associated with human-like intelligence, so an optimizer that is significantly non-human-like in a way that’s relevant to the domain of discourse is less likely to be labelled “intelligent.”
It seems to me that optimizer got a lot of connotations with specific architecture in mind (unworkable architecture too for making it care to do things in real world).
Well, the way I see it, you take the possibility of the AI that just e.g. maximizes the performance of airplane wing inside a fluid simulator (by conducting a zillion runs of this simulator), and then after a bit of map-territory confusion and misunderstanding of how the former optimizer works, you equate this with optimizing some real world wing in the real world (without conducting a zillion trials in the real world, evolution style). The latter has the issue of symbol grounding, and of building a model of the world, and optimizing inside this model, and then building it in the real world, et cetera.
Interesting. It would never have occurred to me to assume that “optimizer” connotes a trial-and-error brute-force-search architecture of this sort, but apparently it does for at least some listeners. Good to know. So on balance do you endorse “intelligence” instead, or do you prefer some other label for a process that modifies its environment to more effectively achieve a pre-determined result?
process that modifies its environment to more effectively achieve a pre-determined result?
That is the issue, you assume the conclusion. Let’s just call it scary AI, and agree that scary AI is, by definition, scary.
Then let’s move on to actual implementations other than bruteforce nonsense, the actual implementations that need to build a model of the world, and have to operate on basis of this model, rather than the world itself (excluding e.g. evolution that doesn’t need this), processes which may or may not be scary.
Certainly agreed that it’s more useful to implement the thing than to label it. If you have ideas for how to do that, by all means share them. I suggest you do so in a new post, rather than in the comments thread of an unrelated post.
To the extent that we’re still talking about labels, I prefer “optimizer” to “scary AI,” especially when used to describe a class that includes things that aren’t scary, things that aren’t artificial, and things that are at-least-not-unproblematically intelligent. Your mileage may vary.
The phrase “assuming the conclusion” is getting tossed around an awful lot lately. I’m at a loss for what conclusion I’m assuming in the phrase you quote, or what makes that “the issue.” And labelling the whole class of things-we’re-talking-about as “scary AIs” seems to be assuming quite a bit, so if you meant that as an alternative to assuming a conclusion, I’m really bewildered.
Agreed that the distinction you suggest between model-based whatever-they-ares and non-model-based whatever-they-ares is a useful distinction in a lot of discussions.
None of the existing things described as intelligent match your definition of intelligence, and of the hypothetical, just the scary and friendly AIs do (i see the friendly AI as a subtype of scary AI).
Evolution: Doesn’t really work in direction of doing anything pre-defined to environment. Mankind: ditto, go ask ancient Egyptians what exactly we are optimizing about environment or what pre-determined result we were working towards. Individual H. Sapiens: some individuals might do something like that, but not very close. Narrow AIs like circuit designers, airplane wing optimizers, and such: don’t work on the environment.
Only the scary AI fits your definition here. That’s part of why this FAI effort and the scary-AI scare is seen as complete nonsense. There isn’t a single example of general intelligence that works by your definition, natural or otherwise. Your definition of intelligence is narrowed down to the tiny but extremely scary area right near the FAI, and it excludes all the things anyone normally describes as intelligent.
I haven’t offered a definition of intelligence as far as I know, so I’m a little bewildered to suddenly be talking about what does or doesn’t match it.
I infer from the rest of your comment that what you’re taking to be my definition of intelligence is “a process that modifies its environment to more effectively achieve a pre-determined result”, which I neither intended nor endorse as a definition of intelligence.
That aside, though, I think I now understand the context of your initial response… thanks for the clarification. It almost completely fails to overlap with the context I intended in the comment you were responding to.
Well, the point is that if we start to not using the ‘intelligence’ to describe it, but using the ‘really powerful optimization process’ or the like, it gets us to things like:
“a process that modifies its environment to more effectively achieve a pre-determined result”
which are a very apt description of scary AI, but not of anything which is normally described as intelligent. This way the scary AI has more in common with gray goo than with anything that is normally described as intelligent.
I infer from this that your preferred label for the class of things we’re talking around is “intelligence”. Yes?
Edit: I subsequently infer from the downvote: no. Or perhaps irritation that I still want my original question answered.
Edit: Repudiating previous edit.
The preferred label for things like seed AI, or giant neural network sim, or the like, should be intelligence unless they are actually written as a “really powerful optimization process”, in which case it is useful to refer to what exactly they optimize (which is something within themselves, not outside). The scary AI idea arises from lack of understanding of what the intelligences do, and latching onto the first plausible definition, them optimizing towards a goal defined from the start. It may be a good idea to refer to the scary AIs as really powerful optimization processes that optimize the real world towards some specific state, but don’t confuse this with intelligences in general, of which this is a tiny, and so far purely theoretical, subset.
So, suppose I am looking at a system in the world… call it X. Perhaps X is a bunch of mold in my refrigerator. Perhaps it’s my neighbor’s kid. Perhaps it’s a pile of rocks in my backyard. Perhaps it’s a software program on my computer. Perhaps it’s something on an alien planet I’m visiting. Doesn’t matter.
Suppose I want to know whether X is intelligent.
What would you recommend I pay attention to in X’s observable behavior in order to make that decision? That is, what observable properties of X are evidence of intelligence?
Well, if you observe it optimizing something very powerfully, it might be intelligent or it might be a thermostat with high powered heater and cooler and a PID controller. I define intelligence as capability of solving problems, which is about choosing a course of action out of a giant possible space of courses of action based on some sort of criteria, where normally there is no obvious polynomial time solution. One could call it ‘powerful optimization process’ but that brings the connotations of the choice having some strong effect on the environment (which you yourself mentioned), while one could just as well proposition an agent whose goal includes preservation of status quo (i.e. the way things would have been without it) and minimization of it’s own impact, to the detriment of other goals that appeal more to us—and that agent could still be very intelligent even though it’s modification to the environment would be smaller than that by some dumber agent working under exact same goals with exact same weights (as the smarter agent processes larger space of possible solutions and find the solutions that can satisfy both goals better; the agents may have identical algorithm with different CPU speed and the faster may end up visibly ‘optimizing’ its environment less).
edit: I imagine there can be different definitions of intelligence. The Eliezer grew up in a religious family, is himself an atheist, and seem to see the function of intelligence as primarily forming the correct beliefs; something that I don’t find so plausible given that there are very intelligent people who seem to believe in some really odd things, which are so defined as to have no impact on their life. That’s similar to belief that all behaviours in MWI should equate to normality, while believing in MWI. There are also intelligent people whose strange beliefs have impact on their life. The one thing common about people i’d call intelligent, is that they are good at problem solving. The problems being solved vary, and some people’s intelligence is tasked with making un-falsifiable theory of dragon in their garage. Few people would think of this behaviour when they hear phrase ‘optimization process’. But the majority of intelligent people’s intelligence is tasked with something very silly most of the time.
So, echoing that back to you to make sure I understand so far: one important difference between “intelligence” and “optimization process” is that the latter (at least connotatively) implies affecting the environment whereas the former doesn’t. We should be more concerned with the internal operations of the system than with its effects on the environment, and therefore we should talk about “intelligence” rather than “optimization process.” Some people believe “intelligence” refers to the ability to form correct beliefs, but it properly refers to the ability to choose a specific course of action out of a larger space of possibilities based on how well it matches a criterion.
Is that about right, or have I misunderstood something key?
Well, the ‘optimization process’ has the connotations of making something more optimal, the connotations of certain productivity and purpose. Optimal is a positive word. The intelligence, on the other hand, can have goals even less productive than tiling the universe with paperclips. The problem with word intelligence is that it may or may not have positive moral connotations. The internal operation shouldn’t really be very relevant in theory, but in practice, you can have a dumb brick that is just sitting here, and you can have a brick of computronium inside which entire boxed society lives which for some reason decided to go solipsist and deny the outside of the brick. Or you can have a brick that is sitting plotting to take over the world, but it didn’t make a single move yet (and is going to chill out for another million years coz it got patience and isn’t really in a hurry because the goal is bounded, and its e.g. safely in orbit).
If you start talking about powerful optimization processes that do something in the real world, you leave out all the simple, probable, harmless goal systems that the AI can have (and still be immensely useful). The external goals are enormously difficult to define on a system that builds it’s own model of the world.
Well, I think you understood what I meant, it’s just felt as you made a short summary partially out of context. People typically (i.e. virtually always) do that for purpose of twisting other people’s words later on. The arguments over definitions are usually (virtually always) a debate technique designed to obscure the topic and substitute some meanings to edge towards some predefined conclusion. In particular most typically one would want to substitute the ‘powerful optimization process’ for intelligence to create the support for the notion of scary AI.
I do it, here and elsewhere, because most of your comments seem to me entirely orthogonal to the thing they ostensibly respond to, and the charitable interpretation of that is that I’m failing to understand your responses the way you meant them, and my response to that is typically to echo back those responses as I understood them and ask you to either endorse my echo or correct it.
Which, frequently, you respond to with a yet another comment that seems to me entirely orthogonal to my request.
But I can certainly appreciate why, if you’re assuming that I’m trying to twist your words and otherwise being malicious, you’d refuse to cooperate with me in this project.
That’s fine; you’re under no obligation to cooperate, and your assumption isn’t a senseless one.
Neither am I under any obligation to keep trying to communicate in the absence of cooperation, especially when I see no way to prove my good will, especially given that I’m now rather irritated at having been treated as malicious until proven otherwise.
So, as I said, I think the best thing to do is just end this exchange here.
Not really as malicious, just it is an extremely common pattern of behaviour. People are goal driven agents and their reading is also goal driven, picking the meanings for the words as to fit some specific goal, which is surprisingly seldom understanding. Especially in a charged issue like risks of anything, where people typically choose their position via some mix of their political orientation, cynicism, etc etc etc then defend this position like a lawyer defending a client. edit: I guess it echoes the assumption that AI typically isn’t friendly if it has pre-determined goals that it optimizes towards. People typically do have pre-determined goals in discussion.
Sure. And sometimes those goals don’t involve understanding, and involve twisting other people’s words, obscuring the topic, and substituting meanings to edge the conversation towards a predefined conclusion, just as you suggest. In fact, that’s not uncommon. Agreed.
If you mean to suggest by that that I ought not be irritated by you attributing those properties to me, or that I ought not disengage from the conversation in consequence, well, perhaps you’re right. Nevertheless I am irritated, and am consequently disengaging.
I don’t know; I tend to antikibbitz unless I’m involved in the conversation. This most recent time was Dmytry, certainly. The others may have been you.
And I’m not really sure what’s going on between me and Dmytry, really, though we sure do seem to be talking at cross-purposes. Perhaps he misunderstood me, I don’t know.
That said, it’s a failure mode I’ve noticed I get into not-uncommonly. My usual reaction to a conversation getting confusing is to slow all the way down and take very small steps and seek confirmation for each step. Usually it works well, but sometimes interlocutors will neither confirm my step, nor refute it, but rather make some other statement that’s just as opaque to me as the statement I was trying to clarify, and pretty soon I start feeling like they’re having a completely different conversation to which I haven’t even been invited.
I don’t know a good conversational fix for this; past a certain point I tend to just give up and listen.
Go define a paperclip maximizer, or anything at all real maximizer, for a machine that has infinite computing power (and with which one can rather easily define a superhuman, fairly general AI). Your machine has senses but doesn’t have real-world paperclip counter readily given to it.
You make one step in the right direction, that the intelligence does not necessarily share our motivation, and then make a dozen steps backwards when you anthropomorphize that it will actually care about something real just like we do—that the intelligence will necessarily be motivable, for lack of better word, just like humans are.
If you vaguely ask AI to make vague paperclips, the AI got to understand human language, understand your intent, etc. to actually make paperclips rather than say put one paperclip in a mirror box and proclaim “infinitely many paperclips created” (or edit itself and replace some of the if statements so that it is as if there were infinitely many paperclips, or any other perfectly legitimate solution). Then you need a very narrow range of bad understandings, for the AI to understand that the statement means converting universe into paperclips, but not understand that it is also implied that you only need as many paperclips as you want, that you don’t want quark sized paperclips, et cetera.
“Motivability” seems to be a red herring. When we get the first AI capable of strongly affecting the real world, what makes you privilege the hypothesis that the AI’s actions and mistakes will be harmless to us?
If some misguided FAI fool manages to make an AI that has it’s goals somehow magically defined in the territory rather than in the map, in non-wireheadable way, then yes, it may be extremely harmful.
Just about everyone else who’s working on neat AIs (the practical ones), have the goals defined on the internal representations, and as such the wireheading is a perfectly valid perfect solution to the goals. The AI is generally prevented from wireheading itself via constraints, but in so much as the AI has a desire, that’s the desire to wirehead.
If the AI’s map represents the territory accurately enough, the AI can use the map to check the consequences of returning different actions, then pick one action and return it, ipso facto affecting the territory. I think I already know how to build a working paperclipper in a Game of Life universe, and it doesn’t seem to wirehead itself. Do you have a strong argument why all non-magical real-world AIs will wirehead themselves before they get a chance to hurt humans?
Fair enough. We can handwave a little and say that AI2 built by AI1 might be able to sense things and self-modify, but this offloading of the whole problem to AI1 is not really satisfying. We’d like to understand exactly how AIs should sense and self-modify, and right now we don’t.
But the new machine can’t self-modify. My point is about the limitations of cousin_it’s example. The machine has a completely accurate model of the world as input and uses an extremely inefficient algorithm to find a way to paperclip the world.
Perhaps it’s also worth bringing up the example of controllers, which don’t wirehead (or do they, once sufficiently complex?) and do optimize the real world. (Thermostats confuse me. Do they have intentionality despite lacking explicit representations? (FWIW Searle told me the answer was no because of something about consciousness, but I’m not sure how seriously he considered my question.))
Yes, actual thermostats got their shard of the Void from humans, just as humans got their shard of the Void from evolution. (I’d say “God” and not “the Void”, but whatever.) But does evolution have intentionality? The point is to determine whether or not intentionality is fundamentally different from seemingly-simpler kinds of optimization—and if it’s not, then why does symbol grounding seem like such a difficult problem? …Or something, my brain is too stressed to actually think.
I don’t see why it doesn’t seem to wirehead itself, unless for some reason the game of life manipulators are too clumsy to send a glider to achieve the goal by altering the value within the paperclipper (e.g. within it’s map). Ultimately the issue is that the goal is achieved when some cells within paperclipper which define the goal acquire certain values. You need to have rather specific action generator so that it avoids generating the action that changes the cells within paperclipper. Can you explain why this solution would not be arrived at? Can your paperclipper then self improve if it can’t self modify?
I do imagine that very laboriously you can manage to define some sort of paperclipping goal (maximize number of live cells?), on the AI into which you, by hand, hard coded complete understanding of game of life, and you might be able to make it not recognize sending of the glider into the goal system and changing it as ‘goal accomplished’. The issue is not whenever it’s possible (I can make a battery of self replicating glider guns and proclaim them to be an AI), the issue is whenever it is at all likely to happen without immense lot of work implementing much of the stuff that the AI ought to learn, into the AI, by hand. Ultimately with no role for AI’s intelligence as intelligence amplifier, but only as obstacle that gets in your way.
Furthermore, keep in mind that the AI’s model of game of life universe is incomplete. The map does not represent territory accurately enough, and can not, as the AI occupies only a small fraction of the universe, and encodes the universe into itself very inefficiently.
The paperclipper’s goal is not to modify the map in a specific way, but to fill the return value register with a value that obeys specific constraints. (Or to zoom in even further, the paperclipper doesn’t even have a fundamental “goal”. The paperclipper just enumerates different values until it finds one that fits the constraints. When a value is found, it gets written to the register, and the program halts. That’s all the program does.) After that value ends up in the register, it causes ripples in the world, because the register is physically connected to actuators or something, which were also described in the paperclipper’s map. If the value indeed obeys the constraints, the ripples in the world will lead to creating many paperclips.
Not sure what sending gliders has to do with the topic. We’re talking about the paperclipper wireheading itself, not the game manipulators trying to wirehead the paperclipper.
Incompleteness of the model, self-modification and other issues seem to be red herrings. If we have a simple model where wireheading doesn’t happen, why should we believe that wireheading will necessarily happen in more complex models? I think a more formal argument is needed here.
You don’t have simple model where wireheading doesn’t happen, you have the model where you didn’t see how the wireheading would happen by the paperclipper, erhm, touching itself (i.e. it’s own map) with it’s manipulators, satisfying the condition without filling universe with paperclips.
edit: that is to say, the agent which doesn’t internally screw up it’s model, can still e.g. dissolve the coat off a ram chip and attach a wire there, or failing that, produce the fake input for it’s own senses (which we do a whole lot).
Maybe you misunderstood the post. The paperclipper in the post first spends some time thinking without outputting any actions, then it outputs one single action and halts, after which any changes to the map are irrelevant.
We don’t have many models of AIs that output multiple successive actions, but one possible model is to have a one-action AI whose action is to construct a successor AI. In this case the first AI doesn’t wirehead because it’s one-action, and the second AI doesn’t wirehead because it was designed by the first AI to affect the world rather than wirehead.
What makes it choose the action that fills universe with paperclips over the action that makes the goal be achieved by modification to the map? edit: or do you have some really specialized narrow AI that knows nothing whatsoever of itself in the world, and simply solves the paperclip maximization in sandbox inside itself (sandbox where the goal is not existing), then simple mechanisms make this action happen in the world?
edit: to clarify. What you don’t understand is that wireheading is a valid solution to the goal. The agent is not wireheading because it makes it happy, it’s wireheading because wireheading really is the best solution to the goal you have given to it. You need to jump through hoops to make the wireheading not be a valid solution from the agent’s perspective. You not liking it as solution does not suffice. You thinking that it is fake solution does not suffice. The agent has to discard that solution.
edit: to clarify even further. When evaluating possible solutions, agent comes up with an action that makes a boolean function within itself return true. That can happen if the function, abstractly defined, in fact return true, that can happen if an action modifies the boolean function and changes it to return true , that can happen if the action modifies inputs to this boolean function to make it return true.
edit: or do you have some really specialized narrow AI that knows nothing whatsoever of itself in the world, and simply solves the paperclip maximization in sandbox inside itself (sandbox where the goal is not existing), then simple mechanisms make this action happen in the world?
Yes. Though the sandbox is more like a quined formal description of the world with a copy of the AI in it. The AI can’t simulate the whole sandbox, but the AI can prove theorems about the sandbox, which is enough to pick a good action.
So, it proves a theorem that if it creates a glider in such and such spot, so and so directed, then [the goal definition as given inside the AI] becomes true. Then it creates that glider in the real world, the glider glides, and hits straight into the definition as given inside the AI making it true. Why is this invalid solution? I know it’s not what you want it to do—you want it to come up with some mega self replicating glider factory that will fill the universe with paperclips. But it ain’t obligated to do what you want.
The AI reasons with its map, the map of the world. The map depicts events that happen in the world outside of AI, and it also depicts the events that happen to the AI, or to AI’s map of the world. In AI’s map, an event in the world and AI map’s picture of that event are different elements, just as they are different elements of the world itself. The goal that guides AI’s choice of action can then distinguish between an event in the world and AI map’s representation of that event, because these two events are separately depicted in its map.
Can it however distinguish between two different events in the world that result in same map state?
edit: here, example for you. For you, some person you care about, has same place in map even though the atoms get replaced etc. If that person gets ill, you may want to mind upload that person, into an indistinguishable robot body, right? You’ll probably argue that it is a valid solution to escaping death. A lot of people have different map, and they will argue that you’re just making a substitute for your own sake, as the person will be dead, gone forever. Some other people got really bizarre map where they are mapping ‘souls’ and have the person alive in the ‘heaven’, which is on the map. Bottom line is, everyone’s just trying to resolve the problem in the map. In the territory, everyone is gone every second.
edit: and yes, you can make a map which will distinguish between sending a glider that hits the computer, and making a ton of paperclips. You still have a zillion world states, including those not filled with paperclips, mapping to the same point in map as the world filled with paperclips. Your best bet is just making the AI narrow enough that it can only find the solutions where the world is filled with paperclips.
I don’t know, the above reads to me as “Everything is confusing. Anyway, my bottom line is .” I don’t know how to parse this as an argument, how to use it to make any inferences about .
The purpose of the grandparent was to show that it’s not in principle problematic to distinguish between a goal state and that goal state’s image in the map, so there is no reason for wireheading to be consequentialistically appealing, so long as an agent is implemented carefully enough.
Because the AI’s goal doesn’t refer to a spot inside the computer running the AI. The AI just does formal math. You can think of the AI as a program that stops when it finds an integer N obeying a certain equation. Such a program won’t stop upon finding an integer N such that “returning N causes the creation of a glider that crashes into the computer and changes the representation of the equation so that N becomes a valid solution” or whatever. That N is not a valid solution to the original equation, so the program skips it and looks at the next one. Simple as that.
First, you defined the equation so that it included the computer and itself (that simulator it uses to think, and also self improve as needed).
Now you are changing the definitions so that the equation is something else. There’s a good post by Eliezer about being specific , which you are not. Go define the equation first.
Also, it is not a question about narrow AI. I can right now write an ‘AI’ that would try to find self replicating glider gun that tiles entire game of life with something. And yes, that AI may run inside the machine in game of life. The issue is, that’s more like ‘evil terrorists using protein folder simulator AI connected to automated genome lab to make plague’, than ‘the AI maximizes paperclips’.
You handwave too much, and the people who already accept premise, they like the handwave that sounds vaguely theoretic. Those who do not, aren’t too impressed, and are only annoyed.
You handwave too much, and the people who already accept premise
Or the people who understand the mathematics.
Cousin_it’s mathematics is correct, if counter-intuitive to those not used to thinking about quines. Whether it implies what he thinks it implies is a separate question as I discuss here.
Well, I assumed that he was building an AGI, and even agreed that it is entirely possible to rig the AI so that something the AI does inside a sim, gets replicated in the outside world. I even gave example: you make narrow AI that generates a virus mostly by simulated molecular interactions (and has some sim of the human immune system, people’s response to the world events, what WHO might do, and such) and wire it up to a virus making lab that can vent it’s produce into the air in the building or something edit: or best yet one that can mail samples to what ever addresses. That would be the AI that kills everyone. Including the AI itself in it’s sim would serve little functional role, and this AI won’t wirehead. It’s clear that the AGI risk is not about this.
edit: and to clarify, the problem with vague handwaving is that without defining what you handwave around, it is easy to produce stuff that is irrelevant, but appears relevant and math-y.
edit: hmm, seems that post with the virus making AI example didn’t get posted. Still, http://lesswrong.com/lw/bfj/evidence_for_the_orthogonality_thesis/68cf and http://lesswrong.com/lw/bfj/evidence_for_the_orthogonality_thesis/68eo convey the point. I’ve never said it is literally impossible to make a narrow AI that is rigged to tile the game world with blocks. It is, clearly, possible. One could make a glider gun iterator that finds the self replicating glider gun in the simulator, then some simple mechanisms set to make that gun in the real world. That is not a case of AI wanting to do something to the real world. That’s a glorified case of ‘my thermostat doesn’t wirehead’, to borrow from Will_Newsome.
Other issue is that one could immediately define some specific goal like ‘number of live cells’, and we could discuss this more specifically, instead of vague handwave about ill defined goal. But I can’t just define things narrowly for the other side of an argument. The wireheading is a problem of systems that can improve themselves. A system that can e.g. decide that it can’t figure out how to maximize live cells but it can prove some good theorems about four blocks.
Then you need a very narrow range of bad understandings, for the AI to understand that the statement means converting universe into paperclips, but not understand that it is also implied that you only need as many paperclips as you want, that you don’t want quark sized paperclips, et cetera.
That’s a good point, but once we develop AIs that can cross the gap of understanding, how do you guarantee that no one asks their AI to convert the universe into paperclips, intentionally or not?
(I’ve made all these arguments before on LessWrong and it doesn’t seem to have done anything. You’re being a lot more patient than I was, though, so perhaps you’ll have better luck.
That would be correct in some sense, but wouldn’t accomplish the goal of explaining to the victim why superintelligences don’t necessarily share our morals.
Yes, that was my first reaction also, if only because it’s possible to attack that premise without reference to tricky AI mumbo-jumbo. It would be mildly clever but rather misleading to apply the reversal test: “You think a superintelligence will tend towards superbenevolence, but allegedly-benevolent humans are doing so little to create the aforementioned superintelligence;—humans apparently aren’t as benevolent as they seem, so why think a superhuman intelligence will be disanalogously benevolent? Contradiction, sucka!” This argument is of course fallacious because humans spend more on AGI development than do frogs—the great chain of being argument holds.
Looking back at my comment I can see why it might read like I’m a hardcore moral relativist. I don’t think I am — although I’ve never been sure of what meta-ethicists’ terms like “moral relativist” mean exactly — I just left qualifiers out of my original post to keep it punchy.
(I don’t believe, for example, that telling right from wrong is impossible, if we interpret “telling right from wrong” to mean “making a moral judgement that most humans agree with”. The claim behind my “But we humans can’t even do that!” is a weaker one: there are some moral questions with no consensus answer, or where there is a consensus but some people flout it. In situations like these people sometimes even accuse other people outright of not knowing right from wrong, or incredulously ask, “don’t you know right from wrong?” I see no necessary reason why the same issues wouldn’t crop up for other, smarter intelligences.)
The claim behind my “But we humans can’t even do that!” is a weaker one: there are some moral questions with no consensus answer, or where there is a consensus but some people flout it. In situations like these people sometimes even accuse other people outright of not knowing right from wrong, or incredulously ask, “don’t you know right from wrong?”
Absence of consensus does not imply absence of objective truth
I see no necessary reason why the same issues wouldn’t crop up for other, smarter intelligences.
i don’t know about “necessary” but “they’re smarter” is possible and reasonably likely.
But such a machine would not be truly intelligent....That’s actually my definition of “superintelligent”
If no-one is actually working on that kind of intelligence, one that’s highly efficient at arbitrary and rigid goals (an AOC)...then what’s the problem?
Or, more generally, using the word “intelligence” may be counterproductive. If we used something more like “the thing that happens to a computer when you upgrade its hardware, or in the course of going from a chess program that checks every option to a chess program that uses on-average-effective heuristics,” maybe people would go along on their own (well, if they were already interested in the topic enough to sit through that).
If your beef is about unintelligent, but super efficient machines, why communicate with the .AI community ? That’s generally not what they are trying to build.
Me: Who cares about definitions of words? Humanity can someday find a way to build such a machine, and then we’re all screwed.
Assuming the conclusion.
Me (under my breath): Yeah. That’s actually my definition of “superintelligent”, but you seem to have a concept of “intelligence” that’s entangled with many accidental facts about humans, so let’s not go there.
(LessWrong, pretend I went through and tagged all comments in this thread that assume their conclusion with “Assuming the conclusion.”.)
Pretend I went through and downvoted all the tags. If they were anything like the grandparent they would be gross misapplications of the phrase. “Assuming the conclusions” just isn’t what cousin_it is doing in those two particular quotes.
Me (under my breath): Yeah. That’s actually my definition of “superintelligent”, but you seem to have a concept of “intelligence” that’s entangled with many accidental facts about humans, so let’s not go there.
Assuming the conclusion.
That is making commentary on the conversation with implied criticism of the other’s perceived misuse of semantic quibbling. The ‘conclusion’ you would object to cousin_it assuming doesn’t even get involved there.
I don’t see how you can think that saying “humanity can someday find a way to build such a machine” isn’t assuming the conclusion. That’s the conclusion, and it’s being used as an argument.
That is making commentary on the conversation with implied criticism of the other’s perceived misuse of semantic quibbling.
“[Y]ou seem to have a concept of ‘intelligence’ that’s entangled with many accidental facts about humans” is the conclusion. Slepnev assumes it. Therefore, Slepnev assumes the conclusion. (It would be a restatement of the conclusion if his earlier arguments hadn’t also just been assuming the conclusion.) That the assumption of the conclusion is only implicit in the criticism doesn’t make it any less unjustified; in fact, it makes it more unjustified, because it has overtones of ‘the conclusion I have asserted is obviously correct, and you are stupid for not already having come to the same conclusion I have’.
Remember, I mostly agree with Slepnev’s conclusion, which is why I’m especially annoyed by non-arguments for it that are likely to just be turnoffs for many intelligent people and banners of cultish acceptance for many stupid people.
I’ve had several conversations that went like this:
Victim: But surely a smart artificial intelligence will be able to tell right from wrong, if we humans can do that?
Me: Forget about the word “intelligence” for a moment. Imagine a machine that looks at all actions in turn, and mechanically chooses the action that leads to producing the greatest number of paperclips, in whichever way possible. With enough computing power and enough knowledge about the outside world, the machine might find a way to convert the whole world into a paperclip factory. The machine will resist any attempts by humans to interfere, because the machine’s goal function doesn’t say anything about humans, only paperclips.
Victim: But such a machine would not be truly intelligent.
Me: Who cares about definitions of words? Humanity can someday find a way to build such a machine, and then we’re all screwed.
Victim: …okay, I see your point. Your machine is not intelligent, but it can be very dangerous because it’s super-efficient.
Me (under my breath): Yeah. That’s actually my definition of “superintelligent”, but you seem to have a concept of “intelligence” that’s entangled with many accidental facts about humans, so let’s not go there.
Maybe we should stop calling it AI and start calling it an outcome pump.
Or a really powerful optimization process.
Assuming the conclusion.
That’s ‘trying to find a name to use that isn’t as loaded with muddled connotations than AI is’. Ciphergoth doesn’t actually conclude anything. He puts forward a concept that potentially future conclusions could be made (or assumed) about.
The concept is itself a conclusion. Ciphergoth puts forth the concept without supporting arguments. Thus he assumes the conclusion. Now, maybe it’s useful to say, “hey, we’ve already derived a cool name for our conclusion: ‘really powerful optimization process’”, and that’s what ciphergoth is doing; but the conclusion is not convincingly argued for anywhere (the arguments are mostly assumptions of the conclusion), and so putting it forth without new arguments is assuming the conclusion.
Introducing the notion of the Moon being made of Gouda doesn’t assume any conclusion. You are being destructive again by not communicating clearly.
If the question is, “Is the Moon made of Gouda?”, and someone puts forth the argument that “The Moon is almost certainly made of Gouda”, how is that not assuming the conclusion? Proposing calling ‘A[G]I’ (notice the equivocation on AI and AGI) a “really powerful optimization process” is like saying “we shouldn’t call it ‘Moon’, ‘Moon’ is too vague; we should call it Giant Heavenly Gouda Ball”. How is that not assuming the conclusion? Especially when the arguments for naming it ‘Giant Heavenly Gouda Ball’ amount to “we all agree it’s Giant, Heavenly, and a Ball, and it’s intuitively obvious that even though the Earth isn’t made of Gouda and we’ve never actually been to the ‘Moon’, the ‘Moon’ is almost certainly made of Gouda”.
Repeatedly bringing up the conclusion in different words as if doing so constituted an argument is actively harmful. This stresses me out a lot, and that’s why I’m being destructive. Even if reasserting the conclusion as an argument is not what ciphergoth intended, you must know that’s how it will be taken by the majority of LessWrongers and, more importantly, third parties who will be introduced to AI risk by those LessWrongers, e.g. Stuart.
Nonetheless, I am receptive to your claim of destruction, and will try to adjust my actions accordingly.
I strongly agree on both counts.
I would agree that introducing a concept has connotations of considering the hypothesis that an instantiation of the concept exists or is possible, and without sufficient evidence to support the complexity of the concept, this is privileging the hypothesis, which is close enough to assuming the conclusion. However, it seems weird to make this criticism of “outcome pump” and “really powerful optimization process” and not make the same criticism of “artificial intelligence” when the former are attempts to avoid bad assumptions from connotations of the latter. When “intelligence” makes people think it must be human like, this makes “powerful artificial intelligence” a strictly more specific concept than “powerful optimization process”.
I was under the impression that “artificial intelligence” is meant to differentiate human and machine “intelligence” along technical lines, not moral ones, i.e., to emphasize that they solve problems in technically different ways. “Outcome pump” and “really powerful optimization process” are meant to differentiate human and non-human “intelligence” along moral lines; the justification for this distinction is much less clear-cut. I don’t criticize “artificial intelligence” much because it’s empirically and theoretically demonstrable that humans and machines solve problems in technically different ways, which distinction I thought was the purpose of the term to make.
Does “really powerful optimization process” differentiate at all? Humans are powerful optimization processes too.
True, it only differentiates by connotational reference to standard SingInst arguments. Outside of that context it might be a useful phrase. It’s just that the only reason people use the term is because they allege that “AI” has certain unfortunate connotations, but their arguments for why those connotations are unfortunate are hidden and inconclusive, and so suggesting “really powerful optimization process” instead of “AI” seems to an impartial observer like a sneaky and un-called-for attempt to shift the burden of proof and the frame of the debate. I’m too steeped in SingInst arguments and terminology to know if that’s how it will come across to outsiders; my fear that it’ll come across that way might be excessive.
I can’t speak for others here, but one reason I’ve taken to talking about optimizers rather than intelligences in many cases is because while I’m fairly confident that all intelligences are optimizers, I’m not sure that all optimizers are intelligences, and many of the things I intuitively want to say about “intelligences” it turns out, on consideration, I actually believe about optimizers. (In other cases I in fact turn out to believe them about intelligences, and I use that word in those case.)
I have a pretty good idea what is meant by optimizer. In so far as “intelligence” doesn’t mean the same thing, I don’t know what it means.
Yup, I’m not sure I do either. It’s clear to me that “intelligence” has connotations that “optimizer” lacks, with the result that the former label in practice refers to a subset of the latter, but connotations are notoriously difficult to pin down precisely. One approximation is that “intelligent” is often strongly associated with human-like intelligence, so an optimizer that is significantly non-human-like in a way that’s relevant to the domain of discourse is less likely to be labelled “intelligent.”
It seems to me that optimizer got a lot of connotations with specific architecture in mind (unworkable architecture too for making it care to do things in real world).
Interesting. What specific architectural connotations do you see?
Well, the way I see it, you take the possibility of the AI that just e.g. maximizes the performance of airplane wing inside a fluid simulator (by conducting a zillion runs of this simulator), and then after a bit of map-territory confusion and misunderstanding of how the former optimizer works, you equate this with optimizing some real world wing in the real world (without conducting a zillion trials in the real world, evolution style). The latter has the issue of symbol grounding, and of building a model of the world, and optimizing inside this model, and then building it in the real world, et cetera.
Interesting. It would never have occurred to me to assume that “optimizer” connotes a trial-and-error brute-force-search architecture of this sort, but apparently it does for at least some listeners. Good to know. So on balance do you endorse “intelligence” instead, or do you prefer some other label for a process that modifies its environment to more effectively achieve a pre-determined result?
That is the issue, you assume the conclusion. Let’s just call it scary AI, and agree that scary AI is, by definition, scary.
Then let’s move on to actual implementations other than bruteforce nonsense, the actual implementations that need to build a model of the world, and have to operate on basis of this model, rather than the world itself (excluding e.g. evolution that doesn’t need this), processes which may or may not be scary.
Certainly agreed that it’s more useful to implement the thing than to label it. If you have ideas for how to do that, by all means share them. I suggest you do so in a new post, rather than in the comments thread of an unrelated post.
To the extent that we’re still talking about labels, I prefer “optimizer” to “scary AI,” especially when used to describe a class that includes things that aren’t scary, things that aren’t artificial, and things that are at-least-not-unproblematically intelligent. Your mileage may vary.
The phrase “assuming the conclusion” is getting tossed around an awful lot lately. I’m at a loss for what conclusion I’m assuming in the phrase you quote, or what makes that “the issue.” And labelling the whole class of things-we’re-talking-about as “scary AIs” seems to be assuming quite a bit, so if you meant that as an alternative to assuming a conclusion, I’m really bewildered.
Agreed that the distinction you suggest between model-based whatever-they-ares and non-model-based whatever-they-ares is a useful distinction in a lot of discussions.
None of the existing things described as intelligent match your definition of intelligence, and of the hypothetical, just the scary and friendly AIs do (i see the friendly AI as a subtype of scary AI).
Evolution: Doesn’t really work in direction of doing anything pre-defined to environment. Mankind: ditto, go ask ancient Egyptians what exactly we are optimizing about environment or what pre-determined result we were working towards. Individual H. Sapiens: some individuals might do something like that, but not very close. Narrow AIs like circuit designers, airplane wing optimizers, and such: don’t work on the environment.
Only the scary AI fits your definition here. That’s part of why this FAI effort and the scary-AI scare is seen as complete nonsense. There isn’t a single example of general intelligence that works by your definition, natural or otherwise. Your definition of intelligence is narrowed down to the tiny but extremely scary area right near the FAI, and it excludes all the things anyone normally describes as intelligent.
I haven’t offered a definition of intelligence as far as I know, so I’m a little bewildered to suddenly be talking about what does or doesn’t match it.
I infer from the rest of your comment that what you’re taking to be my definition of intelligence is “a process that modifies its environment to more effectively achieve a pre-determined result”, which I neither intended nor endorse as a definition of intelligence.
That aside, though, I think I now understand the context of your initial response… thanks for the clarification. It almost completely fails to overlap with the context I intended in the comment you were responding to.
Well, the point is that if we start to not using the ‘intelligence’ to describe it, but using the ‘really powerful optimization process’ or the like, it gets us to things like:
“a process that modifies its environment to more effectively achieve a pre-determined result”
which are a very apt description of scary AI, but not of anything which is normally described as intelligent. This way the scary AI has more in common with gray goo than with anything that is normally described as intelligent.
I infer from this that your preferred label for the class of things we’re talking around is “intelligence”. Yes?
Edit: I subsequently infer from the downvote: no. Or perhaps irritation that I still want my original question answered. Edit: Repudiating previous edit.
I didn’t downvote this comment.
The preferred label for things like seed AI, or giant neural network sim, or the like, should be intelligence unless they are actually written as a “really powerful optimization process”, in which case it is useful to refer to what exactly they optimize (which is something within themselves, not outside). The scary AI idea arises from lack of understanding of what the intelligences do, and latching onto the first plausible definition, them optimizing towards a goal defined from the start. It may be a good idea to refer to the scary AIs as really powerful optimization processes that optimize the real world towards some specific state, but don’t confuse this with intelligences in general, of which this is a tiny, and so far purely theoretical, subset.
OK, cool.
So, suppose I am looking at a system in the world… call it X. Perhaps X is a bunch of mold in my refrigerator. Perhaps it’s my neighbor’s kid. Perhaps it’s a pile of rocks in my backyard. Perhaps it’s a software program on my computer. Perhaps it’s something on an alien planet I’m visiting. Doesn’t matter.
Suppose I want to know whether X is intelligent.
What would you recommend I pay attention to in X’s observable behavior in order to make that decision? That is, what observable properties of X are evidence of intelligence?
Well, if you observe it optimizing something very powerfully, it might be intelligent or it might be a thermostat with high powered heater and cooler and a PID controller. I define intelligence as capability of solving problems, which is about choosing a course of action out of a giant possible space of courses of action based on some sort of criteria, where normally there is no obvious polynomial time solution. One could call it ‘powerful optimization process’ but that brings the connotations of the choice having some strong effect on the environment (which you yourself mentioned), while one could just as well proposition an agent whose goal includes preservation of status quo (i.e. the way things would have been without it) and minimization of it’s own impact, to the detriment of other goals that appeal more to us—and that agent could still be very intelligent even though it’s modification to the environment would be smaller than that by some dumber agent working under exact same goals with exact same weights (as the smarter agent processes larger space of possible solutions and find the solutions that can satisfy both goals better; the agents may have identical algorithm with different CPU speed and the faster may end up visibly ‘optimizing’ its environment less).
edit: I imagine there can be different definitions of intelligence. The Eliezer grew up in a religious family, is himself an atheist, and seem to see the function of intelligence as primarily forming the correct beliefs; something that I don’t find so plausible given that there are very intelligent people who seem to believe in some really odd things, which are so defined as to have no impact on their life. That’s similar to belief that all behaviours in MWI should equate to normality, while believing in MWI. There are also intelligent people whose strange beliefs have impact on their life. The one thing common about people i’d call intelligent, is that they are good at problem solving. The problems being solved vary, and some people’s intelligence is tasked with making un-falsifiable theory of dragon in their garage. Few people would think of this behaviour when they hear phrase ‘optimization process’. But the majority of intelligent people’s intelligence is tasked with something very silly most of the time.
OK.
So, echoing that back to you to make sure I understand so far: one important difference between “intelligence” and “optimization process” is that the latter (at least connotatively) implies affecting the environment whereas the former doesn’t. We should be more concerned with the internal operations of the system than with its effects on the environment, and therefore we should talk about “intelligence” rather than “optimization process.” Some people believe “intelligence” refers to the ability to form correct beliefs, but it properly refers to the ability to choose a specific course of action out of a larger space of possibilities based on how well it matches a criterion.
Is that about right, or have I misunderstood something key?
Well, the ‘optimization process’ has the connotations of making something more optimal, the connotations of certain productivity and purpose. Optimal is a positive word. The intelligence, on the other hand, can have goals even less productive than tiling the universe with paperclips. The problem with word intelligence is that it may or may not have positive moral connotations. The internal operation shouldn’t really be very relevant in theory, but in practice, you can have a dumb brick that is just sitting here, and you can have a brick of computronium inside which entire boxed society lives which for some reason decided to go solipsist and deny the outside of the brick. Or you can have a brick that is sitting plotting to take over the world, but it didn’t make a single move yet (and is going to chill out for another million years coz it got patience and isn’t really in a hurry because the goal is bounded, and its e.g. safely in orbit).
If you start talking about powerful optimization processes that do something in the real world, you leave out all the simple, probable, harmless goal systems that the AI can have (and still be immensely useful). The external goals are enormously difficult to define on a system that builds it’s own model of the world.
Agreed that “optimization process” connotes purpose and making something more optimal in the context of that purpose.
Agreed that “optimal” has positive connotations.
Agreed that an intelligence can have goals that are unproductive, in the colloquial modern cultural sense of “unproductive”.
Agreed that “intelligence” may or may not have positive moral connotations.
Agreed that internal operations that don’t affect anything outside the black box are of at-best-problematic relevance to anything outside that box.
Completely at a loss for how any of that relates to any of what I said, or answers my question.
I think I’m going to tap out of the conversation here. Thanks for your time.
Well, I think you understood what I meant, it’s just felt as you made a short summary partially out of context. People typically (i.e. virtually always) do that for purpose of twisting other people’s words later on. The arguments over definitions are usually (virtually always) a debate technique designed to obscure the topic and substitute some meanings to edge towards some predefined conclusion. In particular most typically one would want to substitute the ‘powerful optimization process’ for intelligence to create the support for the notion of scary AI.
I do it, here and elsewhere, because most of your comments seem to me entirely orthogonal to the thing they ostensibly respond to, and the charitable interpretation of that is that I’m failing to understand your responses the way you meant them, and my response to that is typically to echo back those responses as I understood them and ask you to either endorse my echo or correct it.
Which, frequently, you respond to with a yet another comment that seems to me entirely orthogonal to my request.
But I can certainly appreciate why, if you’re assuming that I’m trying to twist your words and otherwise being malicious, you’d refuse to cooperate with me in this project.
That’s fine; you’re under no obligation to cooperate, and your assumption isn’t a senseless one.
Neither am I under any obligation to keep trying to communicate in the absence of cooperation, especially when I see no way to prove my good will, especially given that I’m now rather irritated at having been treated as malicious until proven otherwise.
So, as I said, I think the best thing to do is just end this exchange here.
Not really as malicious, just it is an extremely common pattern of behaviour. People are goal driven agents and their reading is also goal driven, picking the meanings for the words as to fit some specific goal, which is surprisingly seldom understanding. Especially in a charged issue like risks of anything, where people typically choose their position via some mix of their political orientation, cynicism, etc etc etc then defend this position like a lawyer defending a client. edit: I guess it echoes the assumption that AI typically isn’t friendly if it has pre-determined goals that it optimizes towards. People typically do have pre-determined goals in discussion.
Sure. And sometimes those goals don’t involve understanding, and involve twisting other people’s words, obscuring the topic, and substituting meanings to edge the conversation towards a predefined conclusion, just as you suggest. In fact, that’s not uncommon. Agreed.
If you mean to suggest by that that I ought not be irritated by you attributing those properties to me, or that I ought not disengage from the conversation in consequence, well, perhaps you’re right. Nevertheless I am irritated, and am consequently disengaging.
Just by me, right? I deliberately used it like fifty times. (FWIW I’m not sure but I think Dmytry misunderstood you somewhere/somehow.)
I don’t know; I tend to antikibbitz unless I’m involved in the conversation. This most recent time was Dmytry, certainly. The others may have been you.
And I’m not really sure what’s going on between me and Dmytry, really, though we sure do seem to be talking at cross-purposes. Perhaps he misunderstood me, I don’t know.
That said, it’s a failure mode I’ve noticed I get into not-uncommonly. My usual reaction to a conversation getting confusing is to slow all the way down and take very small steps and seek confirmation for each step. Usually it works well, but sometimes interlocutors will neither confirm my step, nor refute it, but rather make some other statement that’s just as opaque to me as the statement I was trying to clarify, and pretty soon I start feeling like they’re having a completely different conversation to which I haven’t even been invited.
I don’t know a good conversational fix for this; past a certain point I tend to just give up and listen.
Assuming the conclusion.
Go define a paperclip maximizer, or anything at all real maximizer, for a machine that has infinite computing power (and with which one can rather easily define a superhuman, fairly general AI). Your machine has senses but doesn’t have real-world paperclip counter readily given to it.
You make one step in the right direction, that the intelligence does not necessarily share our motivation, and then make a dozen steps backwards when you anthropomorphize that it will actually care about something real just like we do—that the intelligence will necessarily be motivable, for lack of better word, just like humans are.
If you vaguely ask AI to make vague paperclips, the AI got to understand human language, understand your intent, etc. to actually make paperclips rather than say put one paperclip in a mirror box and proclaim “infinitely many paperclips created” (or edit itself and replace some of the if statements so that it is as if there were infinitely many paperclips, or any other perfectly legitimate solution). Then you need a very narrow range of bad understandings, for the AI to understand that the statement means converting universe into paperclips, but not understand that it is also implied that you only need as many paperclips as you want, that you don’t want quark sized paperclips, et cetera.
“Motivability” seems to be a red herring. When we get the first AI capable of strongly affecting the real world, what makes you privilege the hypothesis that the AI’s actions and mistakes will be harmless to us?
If some misguided FAI fool manages to make an AI that has it’s goals somehow magically defined in the territory rather than in the map, in non-wireheadable way, then yes, it may be extremely harmful.
Just about everyone else who’s working on neat AIs (the practical ones), have the goals defined on the internal representations, and as such the wireheading is a perfectly valid perfect solution to the goals. The AI is generally prevented from wireheading itself via constraints, but in so much as the AI has a desire, that’s the desire to wirehead.
If the AI’s map represents the territory accurately enough, the AI can use the map to check the consequences of returning different actions, then pick one action and return it, ipso facto affecting the territory. I think I already know how to build a working paperclipper in a Game of Life universe, and it doesn’t seem to wirehead itself. Do you have a strong argument why all non-magical real-world AIs will wirehead themselves before they get a chance to hurt humans?
Eurisko is an important datum.
This isn’t quite an AGI. In particular, it doesn’t even take input from its surroundings.
Fair enough. We can handwave a little and say that AI2 built by AI1 might be able to sense things and self-modify, but this offloading of the whole problem to AI1 is not really satisfying. We’d like to understand exactly how AIs should sense and self-modify, and right now we don’t.
Let it build a machine that takes input from own surroundings.
But the new machine can’t self-modify. My point is about the limitations of cousin_it’s example. The machine has a completely accurate model of the world as input and uses an extremely inefficient algorithm to find a way to paperclip the world.
The second machine can be designed to build a third machine, based on the second machine’s observations.
Yes, but now the argument that you will converge to a paper clipper is much weaker.
Perhaps it’s also worth bringing up the example of controllers, which don’t wirehead (or do they, once sufficiently complex?) and do optimize the real world. (Thermostats confuse me. Do they have intentionality despite lacking explicit representations? (FWIW Searle told me the answer was no because of something about consciousness, but I’m not sure how seriously he considered my question.))
You are looking for intentionality in the wrong place. Why do thermostats exist? Follow the improbability.
Yes, actual thermostats got their shard of the Void from humans, just as humans got their shard of the Void from evolution. (I’d say “God” and not “the Void”, but whatever.) But does evolution have intentionality? The point is to determine whether or not intentionality is fundamentally different from seemingly-simpler kinds of optimization—and if it’s not, then why does symbol grounding seem like such a difficult problem? …Or something, my brain is too stressed to actually think.
Taboo “intentionality”.
Yes, discerning the hidden properties of “intentionality” is the goal which motivates looking at the edge case of thermostats.
I don’t see why it doesn’t seem to wirehead itself, unless for some reason the game of life manipulators are too clumsy to send a glider to achieve the goal by altering the value within the paperclipper (e.g. within it’s map). Ultimately the issue is that the goal is achieved when some cells within paperclipper which define the goal acquire certain values. You need to have rather specific action generator so that it avoids generating the action that changes the cells within paperclipper. Can you explain why this solution would not be arrived at? Can your paperclipper then self improve if it can’t self modify?
I do imagine that very laboriously you can manage to define some sort of paperclipping goal (maximize number of live cells?), on the AI into which you, by hand, hard coded complete understanding of game of life, and you might be able to make it not recognize sending of the glider into the goal system and changing it as ‘goal accomplished’. The issue is not whenever it’s possible (I can make a battery of self replicating glider guns and proclaim them to be an AI), the issue is whenever it is at all likely to happen without immense lot of work implementing much of the stuff that the AI ought to learn, into the AI, by hand. Ultimately with no role for AI’s intelligence as intelligence amplifier, but only as obstacle that gets in your way.
Furthermore, keep in mind that the AI’s model of game of life universe is incomplete. The map does not represent territory accurately enough, and can not, as the AI occupies only a small fraction of the universe, and encodes the universe into itself very inefficiently.
The paperclipper’s goal is not to modify the map in a specific way, but to fill the return value register with a value that obeys specific constraints. (Or to zoom in even further, the paperclipper doesn’t even have a fundamental “goal”. The paperclipper just enumerates different values until it finds one that fits the constraints. When a value is found, it gets written to the register, and the program halts. That’s all the program does.) After that value ends up in the register, it causes ripples in the world, because the register is physically connected to actuators or something, which were also described in the paperclipper’s map. If the value indeed obeys the constraints, the ripples in the world will lead to creating many paperclips.
Not sure what sending gliders has to do with the topic. We’re talking about the paperclipper wireheading itself, not the game manipulators trying to wirehead the paperclipper.
Incompleteness of the model, self-modification and other issues seem to be red herrings. If we have a simple model where wireheading doesn’t happen, why should we believe that wireheading will necessarily happen in more complex models? I think a more formal argument is needed here.
You don’t have simple model where wireheading doesn’t happen, you have the model where you didn’t see how the wireheading would happen by the paperclipper, erhm, touching itself (i.e. it’s own map) with it’s manipulators, satisfying the condition without filling universe with paperclips.
edit: that is to say, the agent which doesn’t internally screw up it’s model, can still e.g. dissolve the coat off a ram chip and attach a wire there, or failing that, produce the fake input for it’s own senses (which we do a whole lot).
Maybe you misunderstood the post. The paperclipper in the post first spends some time thinking without outputting any actions, then it outputs one single action and halts, after which any changes to the map are irrelevant.
We don’t have many models of AIs that output multiple successive actions, but one possible model is to have a one-action AI whose action is to construct a successor AI. In this case the first AI doesn’t wirehead because it’s one-action, and the second AI doesn’t wirehead because it was designed by the first AI to affect the world rather than wirehead.
What makes it choose the action that fills universe with paperclips over the action that makes the goal be achieved by modification to the map? edit: or do you have some really specialized narrow AI that knows nothing whatsoever of itself in the world, and simply solves the paperclip maximization in sandbox inside itself (sandbox where the goal is not existing), then simple mechanisms make this action happen in the world?
edit: to clarify. What you don’t understand is that wireheading is a valid solution to the goal. The agent is not wireheading because it makes it happy, it’s wireheading because wireheading really is the best solution to the goal you have given to it. You need to jump through hoops to make the wireheading not be a valid solution from the agent’s perspective. You not liking it as solution does not suffice. You thinking that it is fake solution does not suffice. The agent has to discard that solution.
edit: to clarify even further. When evaluating possible solutions, agent comes up with an action that makes a boolean function within itself return true. That can happen if the function, abstractly defined, in fact return true, that can happen if an action modifies the boolean function and changes it to return true , that can happen if the action modifies inputs to this boolean function to make it return true.
Yes. Though the sandbox is more like a quined formal description of the world with a copy of the AI in it. The AI can’t simulate the whole sandbox, but the AI can prove theorems about the sandbox, which is enough to pick a good action.
So, it proves a theorem that if it creates a glider in such and such spot, so and so directed, then [the goal definition as given inside the AI] becomes true. Then it creates that glider in the real world, the glider glides, and hits straight into the definition as given inside the AI making it true. Why is this invalid solution? I know it’s not what you want it to do—you want it to come up with some mega self replicating glider factory that will fill the universe with paperclips. But it ain’t obligated to do what you want.
The AI reasons with its map, the map of the world. The map depicts events that happen in the world outside of AI, and it also depicts the events that happen to the AI, or to AI’s map of the world. In AI’s map, an event in the world and AI map’s picture of that event are different elements, just as they are different elements of the world itself. The goal that guides AI’s choice of action can then distinguish between an event in the world and AI map’s representation of that event, because these two events are separately depicted in its map.
Can it however distinguish between two different events in the world that result in same map state?
edit: here, example for you. For you, some person you care about, has same place in map even though the atoms get replaced etc. If that person gets ill, you may want to mind upload that person, into an indistinguishable robot body, right? You’ll probably argue that it is a valid solution to escaping death. A lot of people have different map, and they will argue that you’re just making a substitute for your own sake, as the person will be dead, gone forever. Some other people got really bizarre map where they are mapping ‘souls’ and have the person alive in the ‘heaven’, which is on the map. Bottom line is, everyone’s just trying to resolve the problem in the map. In the territory, everyone is gone every second.
edit: and yes, you can make a map which will distinguish between sending a glider that hits the computer, and making a ton of paperclips. You still have a zillion world states, including those not filled with paperclips, mapping to the same point in map as the world filled with paperclips. Your best bet is just making the AI narrow enough that it can only find the solutions where the world is filled with paperclips.
I don’t know, the above reads to me as “Everything is confusing. Anyway, my bottom line is .” I don’t know how to parse this as an argument, how to use it to make any inferences about .
The purpose of the grandparent was to show that it’s not in principle problematic to distinguish between a goal state and that goal state’s image in the map, so there is no reason for wireheading to be consequentialistically appealing, so long as an agent is implemented carefully enough.
Because the AI’s goal doesn’t refer to a spot inside the computer running the AI. The AI just does formal math. You can think of the AI as a program that stops when it finds an integer N obeying a certain equation. Such a program won’t stop upon finding an integer N such that “returning N causes the creation of a glider that crashes into the computer and changes the representation of the equation so that N becomes a valid solution” or whatever. That N is not a valid solution to the original equation, so the program skips it and looks at the next one. Simple as that.
First, you defined the equation so that it included the computer and itself (that simulator it uses to think, and also self improve as needed).
Now you are changing the definitions so that the equation is something else. There’s a good post by Eliezer about being specific , which you are not. Go define the equation first.
Also, it is not a question about narrow AI. I can right now write an ‘AI’ that would try to find self replicating glider gun that tiles entire game of life with something. And yes, that AI may run inside the machine in game of life. The issue is, that’s more like ‘evil terrorists using protein folder simulator AI connected to automated genome lab to make plague’, than ‘the AI maximizes paperclips’.
I’m bowing out of this discussion because it doesn’t seem to improve anyone’s understanding.
You handwave too much, and the people who already accept premise, they like the handwave that sounds vaguely theoretic. Those who do not, aren’t too impressed, and are only annoyed.
Or the people who understand the mathematics.
Cousin_it’s mathematics is correct, if counter-intuitive to those not used to thinking about quines. Whether it implies what he thinks it implies is a separate question as I discuss here.
Well, I assumed that he was building an AGI, and even agreed that it is entirely possible to rig the AI so that something the AI does inside a sim, gets replicated in the outside world. I even gave example: you make narrow AI that generates a virus mostly by simulated molecular interactions (and has some sim of the human immune system, people’s response to the world events, what WHO might do, and such) and wire it up to a virus making lab that can vent it’s produce into the air in the building or something edit: or best yet one that can mail samples to what ever addresses. That would be the AI that kills everyone. Including the AI itself in it’s sim would serve little functional role, and this AI won’t wirehead. It’s clear that the AGI risk is not about this.
edit: and to clarify, the problem with vague handwaving is that without defining what you handwave around, it is easy to produce stuff that is irrelevant, but appears relevant and math-y.
edit: hmm, seems that post with the virus making AI example didn’t get posted. Still, http://lesswrong.com/lw/bfj/evidence_for_the_orthogonality_thesis/68cf and http://lesswrong.com/lw/bfj/evidence_for_the_orthogonality_thesis/68eo convey the point. I’ve never said it is literally impossible to make a narrow AI that is rigged to tile the game world with blocks. It is, clearly, possible. One could make a glider gun iterator that finds the self replicating glider gun in the simulator, then some simple mechanisms set to make that gun in the real world. That is not a case of AI wanting to do something to the real world. That’s a glorified case of ‘my thermostat doesn’t wirehead’, to borrow from Will_Newsome.
Other issue is that one could immediately define some specific goal like ‘number of live cells’, and we could discuss this more specifically, instead of vague handwave about ill defined goal. But I can’t just define things narrowly for the other side of an argument. The wireheading is a problem of systems that can improve themselves. A system that can e.g. decide that it can’t figure out how to maximize live cells but it can prove some good theorems about four blocks.
That’s a good point, but once we develop AIs that can cross the gap of understanding, how do you guarantee that no one asks their AI to convert the universe into paperclips, intentionally or not?
I find it really dubious that you could make an AI that would just do in the real world what ever you vaguely ask it to do.
(I’ve made all these arguments before on LessWrong and it doesn’t seem to have done anything. You’re being a lot more patient than I was, though, so perhaps you’ll have better luck.
By the way, The Polynomial is pretty awesome.)
Did anyone else have their first reaction as wanting to attack the starting premise?
Victim: But surely a smart artificial intelligence will be able to tell right from wrong, if we humans can do that?
Me: But we humans can’t even do that!
That would be correct in some sense, but wouldn’t accomplish the goal of explaining to the victim why superintelligences don’t necessarily share our morals.
Yes, that was my first reaction also, if only because it’s possible to attack that premise without reference to tricky AI mumbo-jumbo. It would be mildly clever but rather misleading to apply the reversal test: “You think a superintelligence will tend towards superbenevolence, but allegedly-benevolent humans are doing so little to create the aforementioned superintelligence;—humans apparently aren’t as benevolent as they seem, so why think a superhuman intelligence will be disanalogously benevolent? Contradiction, sucka!” This argument is of course fallacious because humans spend more on AGI development than do frogs—the great chain of being argument holds.
Then open the prisons.
Ha.
Looking back at my comment I can see why it might read like I’m a hardcore moral relativist. I don’t think I am — although I’ve never been sure of what meta-ethicists’ terms like “moral relativist” mean exactly — I just left qualifiers out of my original post to keep it punchy.
(I don’t believe, for example, that telling right from wrong is impossible, if we interpret “telling right from wrong” to mean “making a moral judgement that most humans agree with”. The claim behind my “But we humans can’t even do that!” is a weaker one: there are some moral questions with no consensus answer, or where there is a consensus but some people flout it. In situations like these people sometimes even accuse other people outright of not knowing right from wrong, or incredulously ask, “don’t you know right from wrong?” I see no necessary reason why the same issues wouldn’t crop up for other, smarter intelligences.)
Absence of consensus does not imply absence of objective truth
i don’t know about “necessary” but “they’re smarter” is possible and reasonably likely.
Correct, but that doesn’t bear on my claim. Moral disagreements exist, whether or not there’s objective moral truth.
It’s possible, but I don’t know any convincing arguments for why it’s likely, while I can think of plausibility arguments for why it’s unlikely.
If no-one is actually working on that kind of intelligence, one that’s highly efficient at arbitrary and rigid goals (an AOC)...then what’s the problem?
Or, more generally, using the word “intelligence” may be counterproductive. If we used something more like “the thing that happens to a computer when you upgrade its hardware, or in the course of going from a chess program that checks every option to a chess program that uses on-average-effective heuristics,” maybe people would go along on their own (well, if they were already interested in the topic enough to sit through that).
If your beef is about unintelligent, but super efficient machines, why communicate with the .AI community ? That’s generally not what they are trying to build.
Assuming the conclusion.
Assuming the conclusion.
(LessWrong, pretend I went through and tagged all comments in this thread that assume their conclusion with “Assuming the conclusion.”.)
Pretend I went through and downvoted all the tags. If they were anything like the grandparent they would be gross misapplications of the phrase. “Assuming the conclusions” just isn’t what cousin_it is doing in those two particular quotes.
That is making commentary on the conversation with implied criticism of the other’s perceived misuse of semantic quibbling. The ‘conclusion’ you would object to cousin_it assuming doesn’t even get involved there.
I don’t see how you can think that saying “humanity can someday find a way to build such a machine” isn’t assuming the conclusion. That’s the conclusion, and it’s being used as an argument.
“[Y]ou seem to have a concept of ‘intelligence’ that’s entangled with many accidental facts about humans” is the conclusion. Slepnev assumes it. Therefore, Slepnev assumes the conclusion. (It would be a restatement of the conclusion if his earlier arguments hadn’t also just been assuming the conclusion.) That the assumption of the conclusion is only implicit in the criticism doesn’t make it any less unjustified; in fact, it makes it more unjustified, because it has overtones of ‘the conclusion I have asserted is obviously correct, and you are stupid for not already having come to the same conclusion I have’.
Remember, I mostly agree with Slepnev’s conclusion, which is why I’m especially annoyed by non-arguments for it that are likely to just be turnoffs for many intelligent people and banners of cultish acceptance for many stupid people.