This has been proposed before, and on LW is usually referred to as “Oracle AI”. There’s an entry for it on the LessWrong wiki, including some interesting links to various discussions of the idea. Eliezer has addressed it as well.
See also Tool AI, from the discussions between Holden Karnofsky and LW.
A process can progress towards some end state even without having any representation of that state. Imagine a program that takes a positive number at the beginning, and at each step replaces the current number “x” with value “x/2 + 1/x”. Regardless of the original number, the values will gradually move towards a constant. Can we say that this process has a “goal” or achieving the given number? It feels wrong to use this word here, because the constant is nowhere in the process, it just happens.
Typically, when we speak about having a “goal” X, we mean that somewhere (e.g. in human brain, or in the company’s mission statement) there is a representation of X, and then the reality is compared with X, various paths from here to X are evaluated, and then one of those paths is followed.
I am saying this to make more obvious that there is a difference between “having a representation of X” and “progressing towards X”. Humans typically create representations of their desired end states, and then try finding a way to achieve them. Your computer doesn’t have this, and neither does “Tool AI” at the beginning. Whether it can create representations later, that depends on technical details, how specifically such “Tool AI” is programmed.
Maybe there is a way to allow superhuman thinking even without creating representations corresponding to things normally perceived in our world. (For example AIXI.) But even in such case, there is a risk of having a pseudo-goal of the “x/2 + 1/x” kind, where the process progresses towards an outcome even without having a representation of it. AI can “escape from the box” even without having a representation of “box” and “escape”, if there exists a way to escape from it.
I don’t get this explanation. Sure, a process can tend toward a certain result, without having an explicit representation of that result. But such tendencies often seem to be fragile. For example, a car engine homeostatically tends toward a certain idle speed. But take out one or all spark plugs, and the previously stable performance evaporates. Goals-as-we-know-them, by contrast, tend to be very robust. When a human being loses a leg, they will obtain a synthetic one, or use a wheelchair. That kind of robustness is part of what makes a very powerful agent scary, because it is intimately related to the agent’s seeing many things as potential resources to use toward its ends.
This has been proposed before, and on LW is usually referred to as “Oracle AI”. There’s an entry for it on the LessWrong wiki, including some interesting links to various discussions of the idea. Eliezer has addressed it as well.
See also Tool AI, from the discussions between Holden Karnofsky and LW.
I was just reading though the Eliezer article. I’m not sure I understand. Is he saying that my computer actually does have goals?
Isn’t there a difference between simple cause and effect and an optimization process that aims at some specific state?
Maybe it would help to “taboo” the word “goal”.
A process can progress towards some end state even without having any representation of that state. Imagine a program that takes a positive number at the beginning, and at each step replaces the current number “x” with value “x/2 + 1/x”. Regardless of the original number, the values will gradually move towards a constant. Can we say that this process has a “goal” or achieving the given number? It feels wrong to use this word here, because the constant is nowhere in the process, it just happens.
Typically, when we speak about having a “goal” X, we mean that somewhere (e.g. in human brain, or in the company’s mission statement) there is a representation of X, and then the reality is compared with X, various paths from here to X are evaluated, and then one of those paths is followed.
I am saying this to make more obvious that there is a difference between “having a representation of X” and “progressing towards X”. Humans typically create representations of their desired end states, and then try finding a way to achieve them. Your computer doesn’t have this, and neither does “Tool AI” at the beginning. Whether it can create representations later, that depends on technical details, how specifically such “Tool AI” is programmed.
Maybe there is a way to allow superhuman thinking even without creating representations corresponding to things normally perceived in our world. (For example AIXI.) But even in such case, there is a risk of having a pseudo-goal of the “x/2 + 1/x” kind, where the process progresses towards an outcome even without having a representation of it. AI can “escape from the box” even without having a representation of “box” and “escape”, if there exists a way to escape from it.
I don’t get this explanation. Sure, a process can tend toward a certain result, without having an explicit representation of that result. But such tendencies often seem to be fragile. For example, a car engine homeostatically tends toward a certain idle speed. But take out one or all spark plugs, and the previously stable performance evaporates. Goals-as-we-know-them, by contrast, tend to be very robust. When a human being loses a leg, they will obtain a synthetic one, or use a wheelchair. That kind of robustness is part of what makes a very powerful agent scary, because it is intimately related to the agent’s seeing many things as potential resources to use toward its ends.