Hmm. I’ve been using the term “goal misgeneralization” sometimes. I think the issue is:
You’re taking “generalization” to be a type of cognitive action / mental move that a particular agent can take
I’m taking “generalization” as a neutral description of the basic, obvious fact that the agent gets rewards / updates in some situations, and then takes actions in other situations. Whatever determines those latter actions at the end of the day is evidently “how the AI generalized” by definition.
You’re taking the “mis” in “misgeneralization” to be normative from the agent’s perspective (i.e., the agent is “mis-generalizing” by its own lights). (Update: OR, maybe you’re taking it to be normative with respect to some “objective standard of correct generalization”??)
I’m taking the “mis” in “misgeneralization” to be normative from the AI programmer’s perspective (i.e., the AI is “generalizing” in a way that makes the programmer unhappyis wrong with respect to the intended software behavior [updated per Joe’s reply, see below]).
You’re welcome to disagree.
If this is right, then I agree that the thing you’re talking about in this post is a possible misunderstanding / confusion that we should be aware of. No opinion about whether people have actually been confused by this in reality, I didn’t check.
I think you’re correct, but I find “misgeneralization” an unhelpful word to use for “behaved in a way that made the programmer unhappy”. It suggests too strong an idea of some natural correct generalization. This seems needlessly likely to lead to muddled thinking (and miscommunication).
I guess I’d prefer “malgeneralization”: it’s not incorrect, but rather just an outcome I didn’t like.
Hmm, maybe, but I think there’s a normal situation in which a programmer wants and expects her software to do X, and then she runs the code and it does Y, and she turns to her friend and says “my software did the wrong thing”, or “my software behaved incorrectly”, etc. When she says “wrong” / “incorrect”, she means it with respect to the (implicit or explicit) specification / plan / idea-in-her-head.
I think that, in a similar way, using the word “misgeneralization” is arguably OK here. (I guess my “unhappy” wording above was poorly-chosen.)
Sure, I don’t think it’s entirely wrong to have started using the word this way (something akin to “misbehave” rather than “misfire”). However, when I take a step back and ask “Is using it this way net positive in promoting clear understanding and communication?”, I conclude that it’s unhelpful.
Yeah, me neither—mainly it just clarified the point, and is the first alternative I’ve thought of that seems not-too-bad. It still bothers me that it could be taken as short for “malicious/malign/malevolent generalization”.
Hmm. I’ve been using the term “goal misgeneralization” sometimes. I think the issue is:
You’re taking “generalization” to be a type of cognitive action / mental move that a particular agent can take
I’m taking “generalization” as a neutral description of the basic, obvious fact that the agent gets rewards / updates in some situations, and then takes actions in other situations. Whatever determines those latter actions at the end of the day is evidently “how the AI generalized” by definition.
You’re taking the “mis” in “misgeneralization” to be normative from the agent’s perspective (i.e., the agent is “mis-generalizing” by its own lights). (Update: OR, maybe you’re taking it to be normative with respect to some “objective standard of correct generalization”??)
I’m taking the “mis” in “misgeneralization” to be normative from the AI programmer’s perspective (i.e., the AI is “generalizing” in a way that
makes the programmer unhappyis wrong with respect to the intended software behavior [updated per Joe’s reply, see below]).You’re welcome to disagree.
If this is right, then I agree that the thing you’re talking about in this post is a possible misunderstanding / confusion that we should be aware of. No opinion about whether people have actually been confused by this in reality, I didn’t check.
I think you’re correct, but I find “misgeneralization” an unhelpful word to use for “behaved in a way that made the programmer unhappy”. It suggests too strong an idea of some natural correct generalization. This seems needlessly likely to lead to muddled thinking (and miscommunication).
I guess I’d prefer “malgeneralization”: it’s not incorrect, but rather just an outcome I didn’t like.
Hmm, maybe, but I think there’s a normal situation in which a programmer wants and expects her software to do X, and then she runs the code and it does Y, and she turns to her friend and says “my software did the wrong thing”, or “my software behaved incorrectly”, etc. When she says “wrong” / “incorrect”, she means it with respect to the (implicit or explicit) specification / plan / idea-in-her-head.
I think that, in a similar way, using the word “misgeneralization” is arguably OK here. (I guess my “unhappy” wording above was poorly-chosen.)
Sure, I don’t think it’s entirely wrong to have started using the word this way (something akin to “misbehave” rather than “misfire”).
However, when I take a step back and ask “Is using it this way net positive in promoting clear understanding and communication?”, I conclude that it’s unhelpful.
Maybe! I’m open-minded to alternatives. I’m not immediately sold on “malgeneralization” in particular being an improvement on net, but I dunno. 🤔
Yeah, me neither—mainly it just clarified the point, and is the first alternative I’ve thought of that seems not-too-bad. It still bothers me that it could be taken as short for “malicious/malign/malevolent generalization”.