I second this question. Who is arguing that a genie that “does what it’s told” is easier to make than a genie that “does what is meant”? Eliezer didn’t, at least not in this post:
The user interface doesn’t take English inputs. The Outcome Pump isn’t sentient, remember? But it does have 3D scanners for the near vicinity, and built-in utilities for pattern matching. So you hold up a photo of your mother’s head and shoulders; match on the photo; use object contiguity to select your mother’s whole body (not just her head and shoulders); and define the future function using your mother’s distance from the building’s center. The further she gets from the building’s center, the less the time machine’s reset probability.
You cry “Get my mother out of the building!”, for luck, and press Enter.
Further in that thread there’s something regarding computer scientist’s hypothetical reactions to the discussion of wishes.
Variations on the “curing cancer by killing everyone” theme also pop up quite often.
With regards to the “outcome pump”, it is too magical and I’ll give the magical license for it to do what ever the scifi writer wants it to do, and if you want me to be a buzzkill, I can note that of course one could use this dangerous tool by wishing that in the future they press the ‘i am satisfied’ button, which they will also press if a die rolls N consecutive sixes, to put a limit on improbability by making it control the die as a fallback if that’s the most plausible solution (to avoid lower probability things like spontaneous rewiring of your brain, albeit it seems to me that a random finger twitch would be much more probable than anything catastrophic). This also removes the requirement for the user interface, 3D scanners, and other such extras. I recall another science fiction author ponder something like this, but I can’t recall the name, and if memory serves me right, that other science fiction author managed to come up with ways to use this time reset device productively. At the end of the day its just a very dangerous tool, like a big lathe. You forget the safety and leave the tightening key in, and you start it, and it will get caught then bounce off at a great speed, possibly killing you.
So, to summarize, you just wish that a button is pressed, and you press the button when your mother is rescued. That will increase your risk of a stroke.
edit: and of course, one could require entry of a password, attach all sorts of medical monitors that prevent the “satisfied” signal in the event of a stroke or other health complication to minimize risk of a stroke, as well as vibration monitors to prevent it from triggering natural disasters and such. If the improbability gets too high, it’ll just lead to the device breaking down due to it’s normal failure rate being brought up.
That comment thread is really really long (especially if I go up in the thread to try to figure out the context of the comment you linked to), and the fact it’s mostly between people I’ve never paid attention before doesn’t help raise my interest level. Can you summarize what you perceive the debate to be, and how your post fits into it?
Variations on the “curing cancer by killing everyone” theme also pop up quite often.
When I saw this before (here for example), it was also in the context of “programmer makes a mistake when translating ‘cancer cure’ into formal criteria or utility function” as opposed to “saying ‘cure cancer’ in the presence of a superintelligent AI causes it to kill everyone”.
Can you summarize what you perceive the debate to be, and how your post fits into it?
I perceive that stuff to be really confused/ambiguous (and perhaps without clear concept even existing anywhere), and I seen wishes and goal making discussed here a fair lot.
When I saw this before (here for example), it was also in the context of “programmer makes a mistake when translating ‘cancer cure’ into formal criteria or utility function” as opposed to “saying ‘cure cancer’ in the presence of a superintelligent AI causes it to kill everyone”.
The whole first half of my post deals with this situation exactly.
You know, everyone says “utility function” here a lot, but no one is ever clear what it is a function of, i.e. what is it’s input domain (and at times it looks like the everyday meaning of the word “function” as in “the function of this thing is to some verb” is supposed to be evoked instead). Functions are easier to define for simpler domains, e.g. paperclips are a lot easier to define for some Newtonian physics as something made out of a wire that’s just magicked from a spool. And cure for cancer is a lot easier to define as in my example.
Of course, it is a lot easier to say something without ever bothering to specify the context. But if you want to actually think about possible programmer mistakes, you can’t be thinking in terms of what would be easier to say. If you are thinking in terms of what would be easier to say, then even though you want it to be about programming, it is still only about saying things.
edit: You of all people ought to realize that faulty definition of a cancer cure on the UDT’s world soup is not plausible as an actual approach to curing cancer. If you propose that the corners are cut when implementing the notion of curing cancer as a mathematical function, you got to realize that having simple input specification goes par-course. (Simple input specification being, say, data from a contemporary biochemical model of a cell). You also got to realize that stuff like UDT, CDT, and so on, requires some sort of “mathematical intuition” that can at least find maximums, and by itself doesn’t do any world-wrecking on it’s own, without being put into a decision framework. A component considerably more useful than the whole (and especially so for plausibly limited “mathematical intuitions” which can take microseconds to find a cure for a cancer in the sane way and still be unable to even match a housecat when used with some decision theory, taking longer than the lifetime of the universe they are embedded in, to produce anything at all)
Do you think we’ll ever have AIs that can accomplish complex real-world goals for us, not just find some solution to a biochemical problem, but say produce and deliver cancer cures to everyone that needs it, or eliminate suffering, or something like that? If not, why not? If yes, how do you think it will work, that doesn’t involve having a utility function over a complex domain?
Do you think we’ll ever have AIs that can accomplish complex real-world goals for us
This has a constraint that it can not be much more work to specify the goal than to do it in some other way.
How do I think it will not happen, is manual, unaided, no inspection, no viewer, no nothing, magical creation of a cancer curing utility function over a model domain so complex you immediately fall back to “but we can’t look inside” when explaining how it happens that the model can not be used in lieu of empty speculation to see how the cancer curing works out.
How it can work, well, firstly it got to be rather obvious to you that the optimization algorithm (your “mathematical intuition” but with, realistically considerably less power than something like UDT would require) can self improve without an actual world model, without embedding of the self in that world model, a lot more effectively than with. So we have this for “FOOM”.
It can also build a world model without maximizing anything about the world, of course, indeed a part of the decision theory which you know how to formalize is concerned with just that. Realistically one would want to start with some world modelling framework more practical than “space of the possible computer programs”.
Only once you have the world model you can realistically start making an utility function, and you do that with considerable feedback from running the optimization algorithm on just the model and inspecting it.
I assume you do realize that one has to do to a great length to make the runs of optimization algorithm manifest themselves as real world changes, whereas test dry runs on just the model are quite easy. I assume you also realize that very detailed simulations are very impractical.
edit: to take from a highly relevant Russian proverb, you can not impress a surgeon with the dangers of tonsillectomy performed through the anal passage.
Other ways how it can work may involve neural network simulation to the point that you get something thinking and talking (which you’d get in any case after years and years of raising it), at which point its not that much different from raising a kid to do it, really, and very few people would get seriously worked up about the possibility of our replacement by that.
Once we have this self-improved optimization algorithm, do you think everyone who has access to it will be as careful as you’re assuming? As you say, it’s just a dangerous tool, like a lathe. But unlikely a lathe which can only hurt its operator, this thing could take over the world (via economic competition if not by killing everyone) and use it for purposes I’d consider pointless.
Do you agree with this? If not, how do you foresee the scenario play out, once somebody develops a self-improving optimization algorithm that’s powerful enough to be used as part of an AI that can accomplish complex real world goals? What kind of utility functions do you think people will actually end up making, and what will happen after that?
Once we have this self-improved optimization algorithm, do you think everyone who has access to it will be as careful as you’re assuming?
It looks to me like not performing tonsillectomy via anal passage doesn’t require too great carefulness on part of the surgeon.
One can always come up with some speculative way how particular technological progress can spell our doom. Or avert it, as the gradual improvement of optimization algorithms allows for intelligence enhancement and other things that, overall, should lower the danger. The fact that you can generate a scenario in favour of either (starting from the final effect) is entirely uninformative of the total influence.
how do you foresee the scenario play out, once somebody develops a self-improving optimization algorithm that’s powerful enough to be used as part of an AI that can accomplish complex real world goals? What kind of utility functions do you think people will actually end up making, and what will happen after that?
I think I need to ask some question here, to be able to answer this in a way that would be relevant to you. Suppose today you get a function that takes a string similar to:
struct domain{
.... any data
};
real Function(domain value){
.... any code
}
and gives you as a string an initializer for “domain” which results in the largest output of the Function. It’s very magically powerful, albeit some ridiculous things (exact simulation from the big bang to the present day inclusive of the computer running this very algorithm) are reasonably forbidden.
How do you think it can realistically be used, and what mistakes do you picture? Please be specific; when you have a mathematical function, describe it’s domain, or else I will just assume that the word “function” is meant to merely trigger some innate human notion of purpose.
edit: extension of the specification. The optimization function now takes a string S and a real number between 0 and 1 specifying the “optimization power” roughly corresponding to what you can reasonably expect to get from restricting the computational resources available to the optimizer.
Step 1: Build an accurate model of someone’s mind. Domain would be set of possible neuro networks, and Function would run the input neuro net and compare its behavior to previously recorded behavior of the target person (perhaps a bunch of chat logs would be easiest), returning a value indicating how well it matches.
Thanks. Yes, it does. I asked because I don’t want to needlessly waste a lot of time explaining that one would try to use the optimizer to do some of the heavy lifting (which makes it hard to predict an actual solution). What do you think reckless individuals would do?
By the way, your solution would probably just result in a neural network that hard-wires a lot of the recorded behaviour, without doing anything particularly interesting. Observe that an ideal model, given thermal noise, would not result in the best match, whereas a network that connects neurons in parallel to average out the noise and encode the data most accurately, does. I am not sure if fMRI would remedy the problem.
edit: note that this mishap results in Wei_Dai getting some obviously useless answers, not in world destruction.
edit2: by the way, note that an infinite torque lathe motor, while in some sense capable of infinite power output, doesn’t imply that you can make a mistake that will spin up the earth and make us all fly off. You need a whole lot of extra magic for that. Likewise, “outcome pump” needs through the wall 3D scanners to be that dangerous to the old woman, and “UFAI” needs some potentially impossible self references in the world model and a lot of other magic. Bottom line is, it boils down to this: there is this jinn/golem/terminator meme, and it gets rationalized in a science fictional way, and the fact that the golem can be rationalized in the science fictional way provides no information about the future (because i expect it to be rationalizable in such a manner irrespective of the future), hence zero update, hence if I didn’t worry I won’t start to worry. Especially considering how often the AI is the bad guy, I really don’t see any reason to think that issues are under publicized in any way. Whereas the fact that it is awfully hard to rationalize that superdanger when you start with my optimizer (where no magic bans you from making models that you can inspect visually), provides the information against the notion.
I don’t think anyone is claiming that any mistake one might make with a powerful optimization algorithm is a fatal one. As I said, I think the danger is in step 2 where it would be easy to come up with self-mindhacks, i.e., seemingly convincing philosophical insights that aren’t real insights, that cause you to build the FAI with a wrong utility function or adopt crazy philosophies or religions. Do you agree with that?
Whereas the fact that it is awfully hard to rationalize that superdanger when you start with my optimizer (where no magic bans you from making models that you can inspect visually), provides the information against the notion.
Are you assuming that nobody will be tempted to build AIs that make models and optimize over models in a closed loop (e.g., using something like Bayesian decision theory)? Or that such AIs are infeasible or won’t ever be competitive with AIs that have hand-crafted models that allow for visual inspection?
I don’t think anyone is claiming that any mistake one might make with a powerful optimization algorithm is a fatal one.
Well, some people do, by a trick of substituting some magical full blown AI in place of it. I’m sure you are aware of “tool ai” stuff.
As I said, I think the danger is in step 2 where it would be easy to come up with self-mindhacks, i.e., seemingly convincing philosophical insights that aren’t real insights, that cause you to build the FAI with a wrong utility function or adopt crazy philosophies or religions. Do you agree with that?
To kill everyone or otherwise screw up on the grand scale, you still have to actually make it, make some utility function over an actual world model, and so on, and my impression was that you would rely on your mindhack prone scheme for getting technical insights as well. Good thing about nonsense in the technical fields is that it doesn’t work.
Are you assuming that nobody will be tempted to build AIs that make models and optimize over models in a closed loop (e.g., using something like Bayesian decision theory)? Or that such AIs are infeasible or won’t ever be competitive with AIs that have hand-crafted models that allow for visual inspection?
These things come awfully late without bringing in any novel problem solving capacity whatsoever (which degrades them from the status of “superintelligences” to the status of “meh, what ever”), and no, models do not have to be hand crafted to allow for inspection*. Also, your handwave of “Bayesian decision theory” still doesn’t solve any hard problems of representing oneself in the model but neither wireheading nor self destructing. Or the problem of productive use of external computing resources to do something that one can’t actually model without doing it.
At least as far as “neat” AIs go, those are made of components that are individually useful. Of course one can postulate all sorts of combinations of components, but combinations that can’t be used to do anything new or better than what some of the constituents can be straightforwardly used to do, and only want-on-their-own things that components can and were used as tools to do, are not a risk.
edit: TL;DR; the actual “thinking” in a neat generally self willed AI is done by optimization and model-building algorithms that are usable, useful, and widely used within other contexts. Let’s picture it this way. There’s a society of people who work on their fairly narrowly defined jobs, employing their expertise that they obtained by domain specific training. In comes a mutant newborn who will grow to be perfectly selfish, but will have an IQ of 100 exactly. No one cares.
*in case that’s not clear, any competent model of physics can be inspected by creating a camera in it.
I second this question. Who is arguing that a genie that “does what it’s told” is easier to make than a genie that “does what is meant”? Eliezer didn’t, at least not in this post:
The contrast between what is said and what is meant pops up in the general discussion of goals, such as there: http://lesswrong.com/lw/ld/the_hidden_complexity_of_wishes/9nig
Further in that thread there’s something regarding computer scientist’s hypothetical reactions to the discussion of wishes.
Variations on the “curing cancer by killing everyone” theme also pop up quite often.
With regards to the “outcome pump”, it is too magical and I’ll give the magical license for it to do what ever the scifi writer wants it to do, and if you want me to be a buzzkill, I can note that of course one could use this dangerous tool by wishing that in the future they press the ‘i am satisfied’ button, which they will also press if a die rolls N consecutive sixes, to put a limit on improbability by making it control the die as a fallback if that’s the most plausible solution (to avoid lower probability things like spontaneous rewiring of your brain, albeit it seems to me that a random finger twitch would be much more probable than anything catastrophic). This also removes the requirement for the user interface, 3D scanners, and other such extras. I recall another science fiction author ponder something like this, but I can’t recall the name, and if memory serves me right, that other science fiction author managed to come up with ways to use this time reset device productively. At the end of the day its just a very dangerous tool, like a big lathe. You forget the safety and leave the tightening key in, and you start it, and it will get caught then bounce off at a great speed, possibly killing you.
So, to summarize, you just wish that a button is pressed, and you press the button when your mother is rescued. That will increase your risk of a stroke.
edit: and of course, one could require entry of a password, attach all sorts of medical monitors that prevent the “satisfied” signal in the event of a stroke or other health complication to minimize risk of a stroke, as well as vibration monitors to prevent it from triggering natural disasters and such. If the improbability gets too high, it’ll just lead to the device breaking down due to it’s normal failure rate being brought up.
That comment thread is really really long (especially if I go up in the thread to try to figure out the context of the comment you linked to), and the fact it’s mostly between people I’ve never paid attention before doesn’t help raise my interest level. Can you summarize what you perceive the debate to be, and how your post fits into it?
When I saw this before (here for example), it was also in the context of “programmer makes a mistake when translating ‘cancer cure’ into formal criteria or utility function” as opposed to “saying ‘cure cancer’ in the presence of a superintelligent AI causes it to kill everyone”.
I perceive that stuff to be really confused/ambiguous (and perhaps without clear concept even existing anywhere), and I seen wishes and goal making discussed here a fair lot.
The whole first half of my post deals with this situation exactly.
You know, everyone says “utility function” here a lot, but no one is ever clear what it is a function of, i.e. what is it’s input domain (and at times it looks like the everyday meaning of the word “function” as in “the function of this thing is to some verb” is supposed to be evoked instead). Functions are easier to define for simpler domains, e.g. paperclips are a lot easier to define for some Newtonian physics as something made out of a wire that’s just magicked from a spool. And cure for cancer is a lot easier to define as in my example.
Of course, it is a lot easier to say something without ever bothering to specify the context. But if you want to actually think about possible programmer mistakes, you can’t be thinking in terms of what would be easier to say. If you are thinking in terms of what would be easier to say, then even though you want it to be about programming, it is still only about saying things.
edit: You of all people ought to realize that faulty definition of a cancer cure on the UDT’s world soup is not plausible as an actual approach to curing cancer. If you propose that the corners are cut when implementing the notion of curing cancer as a mathematical function, you got to realize that having simple input specification goes par-course. (Simple input specification being, say, data from a contemporary biochemical model of a cell). You also got to realize that stuff like UDT, CDT, and so on, requires some sort of “mathematical intuition” that can at least find maximums, and by itself doesn’t do any world-wrecking on it’s own, without being put into a decision framework. A component considerably more useful than the whole (and especially so for plausibly limited “mathematical intuitions” which can take microseconds to find a cure for a cancer in the sane way and still be unable to even match a housecat when used with some decision theory, taking longer than the lifetime of the universe they are embedded in, to produce anything at all)
Do you think we’ll ever have AIs that can accomplish complex real-world goals for us, not just find some solution to a biochemical problem, but say produce and deliver cancer cures to everyone that needs it, or eliminate suffering, or something like that? If not, why not? If yes, how do you think it will work, that doesn’t involve having a utility function over a complex domain?
This has a constraint that it can not be much more work to specify the goal than to do it in some other way.
How do I think it will not happen, is manual, unaided, no inspection, no viewer, no nothing, magical creation of a cancer curing utility function over a model domain so complex you immediately fall back to “but we can’t look inside” when explaining how it happens that the model can not be used in lieu of empty speculation to see how the cancer curing works out.
How it can work, well, firstly it got to be rather obvious to you that the optimization algorithm (your “mathematical intuition” but with, realistically considerably less power than something like UDT would require) can self improve without an actual world model, without embedding of the self in that world model, a lot more effectively than with. So we have this for “FOOM”.
It can also build a world model without maximizing anything about the world, of course, indeed a part of the decision theory which you know how to formalize is concerned with just that. Realistically one would want to start with some world modelling framework more practical than “space of the possible computer programs”.
Only once you have the world model you can realistically start making an utility function, and you do that with considerable feedback from running the optimization algorithm on just the model and inspecting it.
I assume you do realize that one has to do to a great length to make the runs of optimization algorithm manifest themselves as real world changes, whereas test dry runs on just the model are quite easy. I assume you also realize that very detailed simulations are very impractical.
edit: to take from a highly relevant Russian proverb, you can not impress a surgeon with the dangers of tonsillectomy performed through the anal passage.
Other ways how it can work may involve neural network simulation to the point that you get something thinking and talking (which you’d get in any case after years and years of raising it), at which point its not that much different from raising a kid to do it, really, and very few people would get seriously worked up about the possibility of our replacement by that.
Once we have this self-improved optimization algorithm, do you think everyone who has access to it will be as careful as you’re assuming? As you say, it’s just a dangerous tool, like a lathe. But unlikely a lathe which can only hurt its operator, this thing could take over the world (via economic competition if not by killing everyone) and use it for purposes I’d consider pointless.
Do you agree with this? If not, how do you foresee the scenario play out, once somebody develops a self-improving optimization algorithm that’s powerful enough to be used as part of an AI that can accomplish complex real world goals? What kind of utility functions do you think people will actually end up making, and what will happen after that?
It looks to me like not performing tonsillectomy via anal passage doesn’t require too great carefulness on part of the surgeon.
One can always come up with some speculative way how particular technological progress can spell our doom. Or avert it, as the gradual improvement of optimization algorithms allows for intelligence enhancement and other things that, overall, should lower the danger. The fact that you can generate a scenario in favour of either (starting from the final effect) is entirely uninformative of the total influence.
I think I need to ask some question here, to be able to answer this in a way that would be relevant to you. Suppose today you get a function that takes a string similar to:
and gives you as a string an initializer for “domain” which results in the largest output of the Function. It’s very magically powerful, albeit some ridiculous things (exact simulation from the big bang to the present day inclusive of the computer running this very algorithm) are reasonably forbidden.
How do you think it can realistically be used, and what mistakes do you picture? Please be specific; when you have a mathematical function, describe it’s domain, or else I will just assume that the word “function” is meant to merely trigger some innate human notion of purpose.
edit: extension of the specification. The optimization function now takes a string S and a real number between 0 and 1 specifying the “optimization power” roughly corresponding to what you can reasonably expect to get from restricting the computational resources available to the optimizer.
Here’s what I’d do:
Step 1: Build an accurate model of someone’s mind. Domain would be set of possible neuro networks, and Function would run the input neuro net and compare its behavior to previously recorded behavior of the target person (perhaps a bunch of chat logs would be easiest), returning a value indicating how well it matches.
Step 2: Use my idea here to build an FAI.
In step 2 it would be easy to take fewer precautions and end up hacking your own mind. See this thread for previous discussion.
(Does this answer your question in the spirit that you intended? I’m not sure because I’m not sure why you asked the question.)
Thanks. Yes, it does. I asked because I don’t want to needlessly waste a lot of time explaining that one would try to use the optimizer to do some of the heavy lifting (which makes it hard to predict an actual solution). What do you think reckless individuals would do?
By the way, your solution would probably just result in a neural network that hard-wires a lot of the recorded behaviour, without doing anything particularly interesting. Observe that an ideal model, given thermal noise, would not result in the best match, whereas a network that connects neurons in parallel to average out the noise and encode the data most accurately, does. I am not sure if fMRI would remedy the problem.
edit: note that this mishap results in Wei_Dai getting some obviously useless answers, not in world destruction.
edit2: by the way, note that an infinite torque lathe motor, while in some sense capable of infinite power output, doesn’t imply that you can make a mistake that will spin up the earth and make us all fly off. You need a whole lot of extra magic for that. Likewise, “outcome pump” needs through the wall 3D scanners to be that dangerous to the old woman, and “UFAI” needs some potentially impossible self references in the world model and a lot of other magic. Bottom line is, it boils down to this: there is this jinn/golem/terminator meme, and it gets rationalized in a science fictional way, and the fact that the golem can be rationalized in the science fictional way provides no information about the future (because i expect it to be rationalizable in such a manner irrespective of the future), hence zero update, hence if I didn’t worry I won’t start to worry. Especially considering how often the AI is the bad guy, I really don’t see any reason to think that issues are under publicized in any way. Whereas the fact that it is awfully hard to rationalize that superdanger when you start with my optimizer (where no magic bans you from making models that you can inspect visually), provides the information against the notion.
I don’t think anyone is claiming that any mistake one might make with a powerful optimization algorithm is a fatal one. As I said, I think the danger is in step 2 where it would be easy to come up with self-mindhacks, i.e., seemingly convincing philosophical insights that aren’t real insights, that cause you to build the FAI with a wrong utility function or adopt crazy philosophies or religions. Do you agree with that?
Are you assuming that nobody will be tempted to build AIs that make models and optimize over models in a closed loop (e.g., using something like Bayesian decision theory)? Or that such AIs are infeasible or won’t ever be competitive with AIs that have hand-crafted models that allow for visual inspection?
Well, some people do, by a trick of substituting some magical full blown AI in place of it. I’m sure you are aware of “tool ai” stuff.
To kill everyone or otherwise screw up on the grand scale, you still have to actually make it, make some utility function over an actual world model, and so on, and my impression was that you would rely on your mindhack prone scheme for getting technical insights as well. Good thing about nonsense in the technical fields is that it doesn’t work.
These things come awfully late without bringing in any novel problem solving capacity whatsoever (which degrades them from the status of “superintelligences” to the status of “meh, what ever”), and no, models do not have to be hand crafted to allow for inspection*. Also, your handwave of “Bayesian decision theory” still doesn’t solve any hard problems of representing oneself in the model but neither wireheading nor self destructing. Or the problem of productive use of external computing resources to do something that one can’t actually model without doing it.
At least as far as “neat” AIs go, those are made of components that are individually useful. Of course one can postulate all sorts of combinations of components, but combinations that can’t be used to do anything new or better than what some of the constituents can be straightforwardly used to do, and only want-on-their-own things that components can and were used as tools to do, are not a risk.
edit: TL;DR; the actual “thinking” in a neat generally self willed AI is done by optimization and model-building algorithms that are usable, useful, and widely used within other contexts. Let’s picture it this way. There’s a society of people who work on their fairly narrowly defined jobs, employing their expertise that they obtained by domain specific training. In comes a mutant newborn who will grow to be perfectly selfish, but will have an IQ of 100 exactly. No one cares.
*in case that’s not clear, any competent model of physics can be inspected by creating a camera in it.