Thanks. Yes, it does. I asked because I don’t want to needlessly waste a lot of time explaining that one would try to use the optimizer to do some of the heavy lifting (which makes it hard to predict an actual solution). What do you think reckless individuals would do?
By the way, your solution would probably just result in a neural network that hard-wires a lot of the recorded behaviour, without doing anything particularly interesting. Observe that an ideal model, given thermal noise, would not result in the best match, whereas a network that connects neurons in parallel to average out the noise and encode the data most accurately, does. I am not sure if fMRI would remedy the problem.
edit: note that this mishap results in Wei_Dai getting some obviously useless answers, not in world destruction.
edit2: by the way, note that an infinite torque lathe motor, while in some sense capable of infinite power output, doesn’t imply that you can make a mistake that will spin up the earth and make us all fly off. You need a whole lot of extra magic for that. Likewise, “outcome pump” needs through the wall 3D scanners to be that dangerous to the old woman, and “UFAI” needs some potentially impossible self references in the world model and a lot of other magic. Bottom line is, it boils down to this: there is this jinn/golem/terminator meme, and it gets rationalized in a science fictional way, and the fact that the golem can be rationalized in the science fictional way provides no information about the future (because i expect it to be rationalizable in such a manner irrespective of the future), hence zero update, hence if I didn’t worry I won’t start to worry. Especially considering how often the AI is the bad guy, I really don’t see any reason to think that issues are under publicized in any way. Whereas the fact that it is awfully hard to rationalize that superdanger when you start with my optimizer (where no magic bans you from making models that you can inspect visually), provides the information against the notion.
I don’t think anyone is claiming that any mistake one might make with a powerful optimization algorithm is a fatal one. As I said, I think the danger is in step 2 where it would be easy to come up with self-mindhacks, i.e., seemingly convincing philosophical insights that aren’t real insights, that cause you to build the FAI with a wrong utility function or adopt crazy philosophies or religions. Do you agree with that?
Whereas the fact that it is awfully hard to rationalize that superdanger when you start with my optimizer (where no magic bans you from making models that you can inspect visually), provides the information against the notion.
Are you assuming that nobody will be tempted to build AIs that make models and optimize over models in a closed loop (e.g., using something like Bayesian decision theory)? Or that such AIs are infeasible or won’t ever be competitive with AIs that have hand-crafted models that allow for visual inspection?
I don’t think anyone is claiming that any mistake one might make with a powerful optimization algorithm is a fatal one.
Well, some people do, by a trick of substituting some magical full blown AI in place of it. I’m sure you are aware of “tool ai” stuff.
As I said, I think the danger is in step 2 where it would be easy to come up with self-mindhacks, i.e., seemingly convincing philosophical insights that aren’t real insights, that cause you to build the FAI with a wrong utility function or adopt crazy philosophies or religions. Do you agree with that?
To kill everyone or otherwise screw up on the grand scale, you still have to actually make it, make some utility function over an actual world model, and so on, and my impression was that you would rely on your mindhack prone scheme for getting technical insights as well. Good thing about nonsense in the technical fields is that it doesn’t work.
Are you assuming that nobody will be tempted to build AIs that make models and optimize over models in a closed loop (e.g., using something like Bayesian decision theory)? Or that such AIs are infeasible or won’t ever be competitive with AIs that have hand-crafted models that allow for visual inspection?
These things come awfully late without bringing in any novel problem solving capacity whatsoever (which degrades them from the status of “superintelligences” to the status of “meh, what ever”), and no, models do not have to be hand crafted to allow for inspection*. Also, your handwave of “Bayesian decision theory” still doesn’t solve any hard problems of representing oneself in the model but neither wireheading nor self destructing. Or the problem of productive use of external computing resources to do something that one can’t actually model without doing it.
At least as far as “neat” AIs go, those are made of components that are individually useful. Of course one can postulate all sorts of combinations of components, but combinations that can’t be used to do anything new or better than what some of the constituents can be straightforwardly used to do, and only want-on-their-own things that components can and were used as tools to do, are not a risk.
edit: TL;DR; the actual “thinking” in a neat generally self willed AI is done by optimization and model-building algorithms that are usable, useful, and widely used within other contexts. Let’s picture it this way. There’s a society of people who work on their fairly narrowly defined jobs, employing their expertise that they obtained by domain specific training. In comes a mutant newborn who will grow to be perfectly selfish, but will have an IQ of 100 exactly. No one cares.
*in case that’s not clear, any competent model of physics can be inspected by creating a camera in it.
Thanks. Yes, it does. I asked because I don’t want to needlessly waste a lot of time explaining that one would try to use the optimizer to do some of the heavy lifting (which makes it hard to predict an actual solution). What do you think reckless individuals would do?
By the way, your solution would probably just result in a neural network that hard-wires a lot of the recorded behaviour, without doing anything particularly interesting. Observe that an ideal model, given thermal noise, would not result in the best match, whereas a network that connects neurons in parallel to average out the noise and encode the data most accurately, does. I am not sure if fMRI would remedy the problem.
edit: note that this mishap results in Wei_Dai getting some obviously useless answers, not in world destruction.
edit2: by the way, note that an infinite torque lathe motor, while in some sense capable of infinite power output, doesn’t imply that you can make a mistake that will spin up the earth and make us all fly off. You need a whole lot of extra magic for that. Likewise, “outcome pump” needs through the wall 3D scanners to be that dangerous to the old woman, and “UFAI” needs some potentially impossible self references in the world model and a lot of other magic. Bottom line is, it boils down to this: there is this jinn/golem/terminator meme, and it gets rationalized in a science fictional way, and the fact that the golem can be rationalized in the science fictional way provides no information about the future (because i expect it to be rationalizable in such a manner irrespective of the future), hence zero update, hence if I didn’t worry I won’t start to worry. Especially considering how often the AI is the bad guy, I really don’t see any reason to think that issues are under publicized in any way. Whereas the fact that it is awfully hard to rationalize that superdanger when you start with my optimizer (where no magic bans you from making models that you can inspect visually), provides the information against the notion.
I don’t think anyone is claiming that any mistake one might make with a powerful optimization algorithm is a fatal one. As I said, I think the danger is in step 2 where it would be easy to come up with self-mindhacks, i.e., seemingly convincing philosophical insights that aren’t real insights, that cause you to build the FAI with a wrong utility function or adopt crazy philosophies or religions. Do you agree with that?
Are you assuming that nobody will be tempted to build AIs that make models and optimize over models in a closed loop (e.g., using something like Bayesian decision theory)? Or that such AIs are infeasible or won’t ever be competitive with AIs that have hand-crafted models that allow for visual inspection?
Well, some people do, by a trick of substituting some magical full blown AI in place of it. I’m sure you are aware of “tool ai” stuff.
To kill everyone or otherwise screw up on the grand scale, you still have to actually make it, make some utility function over an actual world model, and so on, and my impression was that you would rely on your mindhack prone scheme for getting technical insights as well. Good thing about nonsense in the technical fields is that it doesn’t work.
These things come awfully late without bringing in any novel problem solving capacity whatsoever (which degrades them from the status of “superintelligences” to the status of “meh, what ever”), and no, models do not have to be hand crafted to allow for inspection*. Also, your handwave of “Bayesian decision theory” still doesn’t solve any hard problems of representing oneself in the model but neither wireheading nor self destructing. Or the problem of productive use of external computing resources to do something that one can’t actually model without doing it.
At least as far as “neat” AIs go, those are made of components that are individually useful. Of course one can postulate all sorts of combinations of components, but combinations that can’t be used to do anything new or better than what some of the constituents can be straightforwardly used to do, and only want-on-their-own things that components can and were used as tools to do, are not a risk.
edit: TL;DR; the actual “thinking” in a neat generally self willed AI is done by optimization and model-building algorithms that are usable, useful, and widely used within other contexts. Let’s picture it this way. There’s a society of people who work on their fairly narrowly defined jobs, employing their expertise that they obtained by domain specific training. In comes a mutant newborn who will grow to be perfectly selfish, but will have an IQ of 100 exactly. No one cares.
*in case that’s not clear, any competent model of physics can be inspected by creating a camera in it.