Do you think we’ll ever have AIs that can accomplish complex real-world goals for us
This has a constraint that it can not be much more work to specify the goal than to do it in some other way.
How do I think it will not happen, is manual, unaided, no inspection, no viewer, no nothing, magical creation of a cancer curing utility function over a model domain so complex you immediately fall back to “but we can’t look inside” when explaining how it happens that the model can not be used in lieu of empty speculation to see how the cancer curing works out.
How it can work, well, firstly it got to be rather obvious to you that the optimization algorithm (your “mathematical intuition” but with, realistically considerably less power than something like UDT would require) can self improve without an actual world model, without embedding of the self in that world model, a lot more effectively than with. So we have this for “FOOM”.
It can also build a world model without maximizing anything about the world, of course, indeed a part of the decision theory which you know how to formalize is concerned with just that. Realistically one would want to start with some world modelling framework more practical than “space of the possible computer programs”.
Only once you have the world model you can realistically start making an utility function, and you do that with considerable feedback from running the optimization algorithm on just the model and inspecting it.
I assume you do realize that one has to do to a great length to make the runs of optimization algorithm manifest themselves as real world changes, whereas test dry runs on just the model are quite easy. I assume you also realize that very detailed simulations are very impractical.
edit: to take from a highly relevant Russian proverb, you can not impress a surgeon with the dangers of tonsillectomy performed through the anal passage.
Other ways how it can work may involve neural network simulation to the point that you get something thinking and talking (which you’d get in any case after years and years of raising it), at which point its not that much different from raising a kid to do it, really, and very few people would get seriously worked up about the possibility of our replacement by that.
Once we have this self-improved optimization algorithm, do you think everyone who has access to it will be as careful as you’re assuming? As you say, it’s just a dangerous tool, like a lathe. But unlikely a lathe which can only hurt its operator, this thing could take over the world (via economic competition if not by killing everyone) and use it for purposes I’d consider pointless.
Do you agree with this? If not, how do you foresee the scenario play out, once somebody develops a self-improving optimization algorithm that’s powerful enough to be used as part of an AI that can accomplish complex real world goals? What kind of utility functions do you think people will actually end up making, and what will happen after that?
Once we have this self-improved optimization algorithm, do you think everyone who has access to it will be as careful as you’re assuming?
It looks to me like not performing tonsillectomy via anal passage doesn’t require too great carefulness on part of the surgeon.
One can always come up with some speculative way how particular technological progress can spell our doom. Or avert it, as the gradual improvement of optimization algorithms allows for intelligence enhancement and other things that, overall, should lower the danger. The fact that you can generate a scenario in favour of either (starting from the final effect) is entirely uninformative of the total influence.
how do you foresee the scenario play out, once somebody develops a self-improving optimization algorithm that’s powerful enough to be used as part of an AI that can accomplish complex real world goals? What kind of utility functions do you think people will actually end up making, and what will happen after that?
I think I need to ask some question here, to be able to answer this in a way that would be relevant to you. Suppose today you get a function that takes a string similar to:
struct domain{
.... any data
};
real Function(domain value){
.... any code
}
and gives you as a string an initializer for “domain” which results in the largest output of the Function. It’s very magically powerful, albeit some ridiculous things (exact simulation from the big bang to the present day inclusive of the computer running this very algorithm) are reasonably forbidden.
How do you think it can realistically be used, and what mistakes do you picture? Please be specific; when you have a mathematical function, describe it’s domain, or else I will just assume that the word “function” is meant to merely trigger some innate human notion of purpose.
edit: extension of the specification. The optimization function now takes a string S and a real number between 0 and 1 specifying the “optimization power” roughly corresponding to what you can reasonably expect to get from restricting the computational resources available to the optimizer.
Step 1: Build an accurate model of someone’s mind. Domain would be set of possible neuro networks, and Function would run the input neuro net and compare its behavior to previously recorded behavior of the target person (perhaps a bunch of chat logs would be easiest), returning a value indicating how well it matches.
Thanks. Yes, it does. I asked because I don’t want to needlessly waste a lot of time explaining that one would try to use the optimizer to do some of the heavy lifting (which makes it hard to predict an actual solution). What do you think reckless individuals would do?
By the way, your solution would probably just result in a neural network that hard-wires a lot of the recorded behaviour, without doing anything particularly interesting. Observe that an ideal model, given thermal noise, would not result in the best match, whereas a network that connects neurons in parallel to average out the noise and encode the data most accurately, does. I am not sure if fMRI would remedy the problem.
edit: note that this mishap results in Wei_Dai getting some obviously useless answers, not in world destruction.
edit2: by the way, note that an infinite torque lathe motor, while in some sense capable of infinite power output, doesn’t imply that you can make a mistake that will spin up the earth and make us all fly off. You need a whole lot of extra magic for that. Likewise, “outcome pump” needs through the wall 3D scanners to be that dangerous to the old woman, and “UFAI” needs some potentially impossible self references in the world model and a lot of other magic. Bottom line is, it boils down to this: there is this jinn/golem/terminator meme, and it gets rationalized in a science fictional way, and the fact that the golem can be rationalized in the science fictional way provides no information about the future (because i expect it to be rationalizable in such a manner irrespective of the future), hence zero update, hence if I didn’t worry I won’t start to worry. Especially considering how often the AI is the bad guy, I really don’t see any reason to think that issues are under publicized in any way. Whereas the fact that it is awfully hard to rationalize that superdanger when you start with my optimizer (where no magic bans you from making models that you can inspect visually), provides the information against the notion.
I don’t think anyone is claiming that any mistake one might make with a powerful optimization algorithm is a fatal one. As I said, I think the danger is in step 2 where it would be easy to come up with self-mindhacks, i.e., seemingly convincing philosophical insights that aren’t real insights, that cause you to build the FAI with a wrong utility function or adopt crazy philosophies or religions. Do you agree with that?
Whereas the fact that it is awfully hard to rationalize that superdanger when you start with my optimizer (where no magic bans you from making models that you can inspect visually), provides the information against the notion.
Are you assuming that nobody will be tempted to build AIs that make models and optimize over models in a closed loop (e.g., using something like Bayesian decision theory)? Or that such AIs are infeasible or won’t ever be competitive with AIs that have hand-crafted models that allow for visual inspection?
I don’t think anyone is claiming that any mistake one might make with a powerful optimization algorithm is a fatal one.
Well, some people do, by a trick of substituting some magical full blown AI in place of it. I’m sure you are aware of “tool ai” stuff.
As I said, I think the danger is in step 2 where it would be easy to come up with self-mindhacks, i.e., seemingly convincing philosophical insights that aren’t real insights, that cause you to build the FAI with a wrong utility function or adopt crazy philosophies or religions. Do you agree with that?
To kill everyone or otherwise screw up on the grand scale, you still have to actually make it, make some utility function over an actual world model, and so on, and my impression was that you would rely on your mindhack prone scheme for getting technical insights as well. Good thing about nonsense in the technical fields is that it doesn’t work.
Are you assuming that nobody will be tempted to build AIs that make models and optimize over models in a closed loop (e.g., using something like Bayesian decision theory)? Or that such AIs are infeasible or won’t ever be competitive with AIs that have hand-crafted models that allow for visual inspection?
These things come awfully late without bringing in any novel problem solving capacity whatsoever (which degrades them from the status of “superintelligences” to the status of “meh, what ever”), and no, models do not have to be hand crafted to allow for inspection*. Also, your handwave of “Bayesian decision theory” still doesn’t solve any hard problems of representing oneself in the model but neither wireheading nor self destructing. Or the problem of productive use of external computing resources to do something that one can’t actually model without doing it.
At least as far as “neat” AIs go, those are made of components that are individually useful. Of course one can postulate all sorts of combinations of components, but combinations that can’t be used to do anything new or better than what some of the constituents can be straightforwardly used to do, and only want-on-their-own things that components can and were used as tools to do, are not a risk.
edit: TL;DR; the actual “thinking” in a neat generally self willed AI is done by optimization and model-building algorithms that are usable, useful, and widely used within other contexts. Let’s picture it this way. There’s a society of people who work on their fairly narrowly defined jobs, employing their expertise that they obtained by domain specific training. In comes a mutant newborn who will grow to be perfectly selfish, but will have an IQ of 100 exactly. No one cares.
*in case that’s not clear, any competent model of physics can be inspected by creating a camera in it.
This has a constraint that it can not be much more work to specify the goal than to do it in some other way.
How do I think it will not happen, is manual, unaided, no inspection, no viewer, no nothing, magical creation of a cancer curing utility function over a model domain so complex you immediately fall back to “but we can’t look inside” when explaining how it happens that the model can not be used in lieu of empty speculation to see how the cancer curing works out.
How it can work, well, firstly it got to be rather obvious to you that the optimization algorithm (your “mathematical intuition” but with, realistically considerably less power than something like UDT would require) can self improve without an actual world model, without embedding of the self in that world model, a lot more effectively than with. So we have this for “FOOM”.
It can also build a world model without maximizing anything about the world, of course, indeed a part of the decision theory which you know how to formalize is concerned with just that. Realistically one would want to start with some world modelling framework more practical than “space of the possible computer programs”.
Only once you have the world model you can realistically start making an utility function, and you do that with considerable feedback from running the optimization algorithm on just the model and inspecting it.
I assume you do realize that one has to do to a great length to make the runs of optimization algorithm manifest themselves as real world changes, whereas test dry runs on just the model are quite easy. I assume you also realize that very detailed simulations are very impractical.
edit: to take from a highly relevant Russian proverb, you can not impress a surgeon with the dangers of tonsillectomy performed through the anal passage.
Other ways how it can work may involve neural network simulation to the point that you get something thinking and talking (which you’d get in any case after years and years of raising it), at which point its not that much different from raising a kid to do it, really, and very few people would get seriously worked up about the possibility of our replacement by that.
Once we have this self-improved optimization algorithm, do you think everyone who has access to it will be as careful as you’re assuming? As you say, it’s just a dangerous tool, like a lathe. But unlikely a lathe which can only hurt its operator, this thing could take over the world (via economic competition if not by killing everyone) and use it for purposes I’d consider pointless.
Do you agree with this? If not, how do you foresee the scenario play out, once somebody develops a self-improving optimization algorithm that’s powerful enough to be used as part of an AI that can accomplish complex real world goals? What kind of utility functions do you think people will actually end up making, and what will happen after that?
It looks to me like not performing tonsillectomy via anal passage doesn’t require too great carefulness on part of the surgeon.
One can always come up with some speculative way how particular technological progress can spell our doom. Or avert it, as the gradual improvement of optimization algorithms allows for intelligence enhancement and other things that, overall, should lower the danger. The fact that you can generate a scenario in favour of either (starting from the final effect) is entirely uninformative of the total influence.
I think I need to ask some question here, to be able to answer this in a way that would be relevant to you. Suppose today you get a function that takes a string similar to:
and gives you as a string an initializer for “domain” which results in the largest output of the Function. It’s very magically powerful, albeit some ridiculous things (exact simulation from the big bang to the present day inclusive of the computer running this very algorithm) are reasonably forbidden.
How do you think it can realistically be used, and what mistakes do you picture? Please be specific; when you have a mathematical function, describe it’s domain, or else I will just assume that the word “function” is meant to merely trigger some innate human notion of purpose.
edit: extension of the specification. The optimization function now takes a string S and a real number between 0 and 1 specifying the “optimization power” roughly corresponding to what you can reasonably expect to get from restricting the computational resources available to the optimizer.
Here’s what I’d do:
Step 1: Build an accurate model of someone’s mind. Domain would be set of possible neuro networks, and Function would run the input neuro net and compare its behavior to previously recorded behavior of the target person (perhaps a bunch of chat logs would be easiest), returning a value indicating how well it matches.
Step 2: Use my idea here to build an FAI.
In step 2 it would be easy to take fewer precautions and end up hacking your own mind. See this thread for previous discussion.
(Does this answer your question in the spirit that you intended? I’m not sure because I’m not sure why you asked the question.)
Thanks. Yes, it does. I asked because I don’t want to needlessly waste a lot of time explaining that one would try to use the optimizer to do some of the heavy lifting (which makes it hard to predict an actual solution). What do you think reckless individuals would do?
By the way, your solution would probably just result in a neural network that hard-wires a lot of the recorded behaviour, without doing anything particularly interesting. Observe that an ideal model, given thermal noise, would not result in the best match, whereas a network that connects neurons in parallel to average out the noise and encode the data most accurately, does. I am not sure if fMRI would remedy the problem.
edit: note that this mishap results in Wei_Dai getting some obviously useless answers, not in world destruction.
edit2: by the way, note that an infinite torque lathe motor, while in some sense capable of infinite power output, doesn’t imply that you can make a mistake that will spin up the earth and make us all fly off. You need a whole lot of extra magic for that. Likewise, “outcome pump” needs through the wall 3D scanners to be that dangerous to the old woman, and “UFAI” needs some potentially impossible self references in the world model and a lot of other magic. Bottom line is, it boils down to this: there is this jinn/golem/terminator meme, and it gets rationalized in a science fictional way, and the fact that the golem can be rationalized in the science fictional way provides no information about the future (because i expect it to be rationalizable in such a manner irrespective of the future), hence zero update, hence if I didn’t worry I won’t start to worry. Especially considering how often the AI is the bad guy, I really don’t see any reason to think that issues are under publicized in any way. Whereas the fact that it is awfully hard to rationalize that superdanger when you start with my optimizer (where no magic bans you from making models that you can inspect visually), provides the information against the notion.
I don’t think anyone is claiming that any mistake one might make with a powerful optimization algorithm is a fatal one. As I said, I think the danger is in step 2 where it would be easy to come up with self-mindhacks, i.e., seemingly convincing philosophical insights that aren’t real insights, that cause you to build the FAI with a wrong utility function or adopt crazy philosophies or religions. Do you agree with that?
Are you assuming that nobody will be tempted to build AIs that make models and optimize over models in a closed loop (e.g., using something like Bayesian decision theory)? Or that such AIs are infeasible or won’t ever be competitive with AIs that have hand-crafted models that allow for visual inspection?
Well, some people do, by a trick of substituting some magical full blown AI in place of it. I’m sure you are aware of “tool ai” stuff.
To kill everyone or otherwise screw up on the grand scale, you still have to actually make it, make some utility function over an actual world model, and so on, and my impression was that you would rely on your mindhack prone scheme for getting technical insights as well. Good thing about nonsense in the technical fields is that it doesn’t work.
These things come awfully late without bringing in any novel problem solving capacity whatsoever (which degrades them from the status of “superintelligences” to the status of “meh, what ever”), and no, models do not have to be hand crafted to allow for inspection*. Also, your handwave of “Bayesian decision theory” still doesn’t solve any hard problems of representing oneself in the model but neither wireheading nor self destructing. Or the problem of productive use of external computing resources to do something that one can’t actually model without doing it.
At least as far as “neat” AIs go, those are made of components that are individually useful. Of course one can postulate all sorts of combinations of components, but combinations that can’t be used to do anything new or better than what some of the constituents can be straightforwardly used to do, and only want-on-their-own things that components can and were used as tools to do, are not a risk.
edit: TL;DR; the actual “thinking” in a neat generally self willed AI is done by optimization and model-building algorithms that are usable, useful, and widely used within other contexts. Let’s picture it this way. There’s a society of people who work on their fairly narrowly defined jobs, employing their expertise that they obtained by domain specific training. In comes a mutant newborn who will grow to be perfectly selfish, but will have an IQ of 100 exactly. No one cares.
*in case that’s not clear, any competent model of physics can be inspected by creating a camera in it.