Noosphere89 comments on Deceptive AI ≠ Deceptively-aligned AI

Noosphere89 7 Jan 2024 19:07 UTC
LW: 4 AF: 3
2
AF
I agree with the claim that deception could arise without deceptive alignment, and mostly agree with the post, but I do still think it’s very important to recognize if/when deceptive alignment fails to work, it changes a lot of the conversation around alignment.
- Seth Herd 7 Jan 2024 19:20 UTC
  LW: 2 AF: 1
  0
  AF Parent
  What do you mean by “when deceptive alignment fails to work”? I’m confused.
  - Steven Byrnes 7 Jan 2024 19:55 UTC
    LW: 7 AF: 5
    5
    AF Parent
    I think Noosphere89 meant to say “when deceptive alignment doesn’t happen” in that sentence. (They can correct me if I’m wrong.)
    Anyway, I think I’m in agreement with Noosphere89 that (1) it’s eminently reasonable to try to figure out whether or not deceptive alignment will happen (in such-and-such AI architecture and training approach), and (2) it’s eminently reasonable to have significantly different levels of overall optimism or pessimism about AI takeover depending on the answer to question (1). I hope this post does not give anyone an impression contrary to that.
    - Noosphere89 7 Jan 2024 22:36 UTC
      LW: 4 AF: 3
      0
      AF Parent
      Yep, that’s what I was talking about, Seth Herd.