I don’t really understand what you mean by goal-oriented
This is the key point. “The Problem” uses the word “goal” some 80 times, but does not define it, does not acknowledge that it’s a complex concept, or consider if some AI might not have it. I wish I could just use your concept of a goal, we shouldn’t need to discuss this, it should have been precisely defined in “The Problem” or some other introductory text.
Personally by “goal” or “goal-oriented” I mean that the utility function of the AI has a simple description. E.g. In the narrow domain of chess moves, the actions a chess bot chooses are very well explained by “trying to win”. On the other hand, in the real world there are many actions that would help winning, which the chess bot ignores, and not for a lack of intelligence, therefore “trying to win” is no longer a good predictor, and I therefore claim that this bot is not actually “goal-oriented”, or at least the goal is very different from plain “winning”. Maybe you would call this property “robustly goal-oriented”?
A second definition could be that any system which moves towards any goal, more than away from it, is “oriented” to that goal. This is reasonable because that’s the economically useful part. And with this definition, the statement “ASI is very likely to exhibit goal-oriented behavior” is trivial. But that’s a very low bar, and extrapolating something about the long term behavior of the system from this definition seems like a mistake.
I think this explanation constitutes me answering in the affirmative to (the spirit of) your first question.
I’m happy with the explanation, my issue is that I don’t feel like I’ve seen this explicitly acknowledged, neither in “The Problem” nor in “A List of Lethalities” (maybe it falls under “We can’t just build a very weak system”, but I don’t agree that it has to be weak) nor in Paul’s posts I’ve read.
not all useful agents are goal-oriented.
The theorem proved I mentioned is one such useful agent. I understand you would call the prover “goal oriented”, but it doesn’t necessarily reach that level under my definition. And at lest we agree that provers can be safe. The usefulness is, among other things, that we could use the prover to work out alignment for more general agents. I don’t want to go too far on the tangent of whether this would actually work, but surely it is a coherent sequence of events, right?
The chess example, as I recall, is a response to two common objections
I don’t hold these objections, and I don’t think anyone reasonable does, especially with the “never” in them. At best I could argue that humans aren’t actually great at pursuing goals robustly, and therefore the AI might also not be.
contemporary general systems do not pursue their goals especially robustly, and it may be hard to make improvement on this axis
It’s not just “hard” to make improvements, but also unnecessary and even suicidal. X-risk arguments seem to assume that goals are robust, but do not convincingly explain why they must be.
I don’t hold these objections, and I don’t think anyone reasonable does, especially with the “never” in them. At best I could argue that humans aren’t actually great at pursuing goals robustly, and therefore the AI might also not be.
The Problem is intended for a general audience (e.g., not LW users). I assure you people make precisely these objections, very often.
Is this supposed to answer my entire comment, in the sense that the general audience doesn’t need precise definitions? That may work for some people but can be off-putting to others. And surely it’s more important to convince AI researchers. Much of the general public already hates AI.
This is the key point. “The Problem” uses the word “goal” some 80 times, but does not define it, does not acknowledge that it’s a complex concept, or consider if some AI might not have it. I wish I could just use your concept of a goal, we shouldn’t need to discuss this, it should have been precisely defined in “The Problem” or some other introductory text.
Personally by “goal” or “goal-oriented” I mean that the utility function of the AI has a simple description. E.g. In the narrow domain of chess moves, the actions a chess bot chooses are very well explained by “trying to win”. On the other hand, in the real world there are many actions that would help winning, which the chess bot ignores, and not for a lack of intelligence, therefore “trying to win” is no longer a good predictor, and I therefore claim that this bot is not actually “goal-oriented”, or at least the goal is very different from plain “winning”. Maybe you would call this property “robustly goal-oriented”?
A second definition could be that any system which moves towards any goal, more than away from it, is “oriented” to that goal. This is reasonable because that’s the economically useful part. And with this definition, the statement “ASI is very likely to exhibit goal-oriented behavior” is trivial. But that’s a very low bar, and extrapolating something about the long term behavior of the system from this definition seems like a mistake.
I’m happy with the explanation, my issue is that I don’t feel like I’ve seen this explicitly acknowledged, neither in “The Problem” nor in “A List of Lethalities” (maybe it falls under “We can’t just build a very weak system”, but I don’t agree that it has to be weak) nor in Paul’s posts I’ve read.
The theorem proved I mentioned is one such useful agent. I understand you would call the prover “goal oriented”, but it doesn’t necessarily reach that level under my definition. And at lest we agree that provers can be safe. The usefulness is, among other things, that we could use the prover to work out alignment for more general agents. I don’t want to go too far on the tangent of whether this would actually work, but surely it is a coherent sequence of events, right?
I don’t hold these objections, and I don’t think anyone reasonable does, especially with the “never” in them. At best I could argue that humans aren’t actually great at pursuing goals robustly, and therefore the AI might also not be.
It’s not just “hard” to make improvements, but also unnecessary and even suicidal. X-risk arguments seem to assume that goals are robust, but do not convincingly explain why they must be.
The Problem is intended for a general audience (e.g., not LW users). I assure you people make precisely these objections, very often.
Is this supposed to answer my entire comment, in the sense that the general audience doesn’t need precise definitions? That may work for some people but can be off-putting to others. And surely it’s more important to convince AI researchers. Much of the general public already hates AI.