Steven Byrnes comments on Misgeneralization as a misnomer

Steven Byrnes 6 Apr 2023 21:23 UTC
LW: 16 AF: 10
4
AF
Hmm. I’ve been using the term “goal misgeneralization” sometimes. I think the issue is:
- You’re taking “generalization” to be a type of cognitive action / mental move that a particular agent can take
- I’m taking “generalization” as a neutral description of the basic, obvious fact that the agent gets rewards / updates in some situations, and then takes actions in other situations. Whatever determines those latter actions at the end of the day is evidently “how the AI generalized” by definition.
- You’re taking the “mis” in “misgeneralization” to be normative from the agent’s perspective (i.e., the agent is “mis-generalizing” by its own lights). (Update: OR, maybe you’re taking it to be normative with respect to some “objective standard of correct generalization”??)
- I’m taking the “mis” in “misgeneralization” to be normative from the AI programmer’s perspective (i.e., the AI is “generalizing” in a way that ~~makes the programmer unhappy~~ is wrong with respect to the intended software behavior [updated per Joe’s reply, see below]).
You’re welcome to disagree.
If this is right, then I agree that the thing you’re talking about in this post is a possible misunderstanding / confusion that we should be aware of. No opinion about whether people have actually been confused by this in reality, I didn’t check.
What links here?
- Joe Collman 6 Apr 2023 23:23 UTC
  LW: 28 AF: 15
  25
  AF Parent
  I think you’re correct, but I find “misgeneralization” an unhelpful word to use for “behaved in a way that made the programmer unhappy”. It suggests too strong an idea of some natural correct generalization. This seems needlessly likely to lead to muddled thinking (and miscommunication).
  I guess I’d prefer “malgeneralization”: it’s not incorrect, but rather just an outcome I didn’t like.
  - Steven Byrnes 6 Apr 2023 23:42 UTC
    LW: 6 AF: 5
    1
    AF Parent
    Hmm, maybe, but I think there’s a normal situation in which a programmer wants and expects her software to do X, and then she runs the code and it does Y, and she turns to her friend and says “my software did the wrong thing”, or “my software behaved incorrectly”, etc. When she says “wrong” / “incorrect”, she means it with respect to the (implicit or explicit) specification / plan / idea-in-her-head.
    I think that, in a similar way, using the word “misgeneralization” is arguably OK here. (I guess my “unhappy” wording above was poorly-chosen.)
    - Joe Collman 7 Apr 2023 13:38 UTC
      LW: 5 AF: 5
      2
      AF Parent
      Sure, I don’t think it’s entirely wrong to have started using the word this way (something akin to “misbehave” rather than “misfire”).
      However, when I take a step back and ask “Is using it this way net positive in promoting clear understanding and communication?”, I conclude that it’s unhelpful.
      - Steven Byrnes 7 Apr 2023 14:06 UTC
        LW: 5 AF: 5
        0
        AF Parent
        Maybe! I’m open-minded to alternatives. I’m not immediately sold on “malgeneralization” in particular being an improvement on net, but I dunno. 🤔
        Joe Collman 7 Apr 2023 17:10 UTC
        LW: 3 AF: 3
        0
        AF Parent
        Yeah, me neither—mainly it just clarified the point, and is the first alternative I’ve thought of that seems not-too-bad. It still bothers me that it could be taken as short for “malicious/malign/malevolent generalization”.