Michael Roe answers What actual bad outcome has “ethics-based” RLHF AI Alignment already prevented?

Michael Roe 19 Oct 2024 14:21 UTC
6 points
0
Well, we had that guy who tried to assassinate the Queen of England with a crossbow because his AI girlfriend told him to. That was clearly a harm to him, and could have been one for the Queen.

We don’t know how much more “But the AI told me to kill Trump” we’d have with less alignment, but it’s a reasonable guess (given the Replika datapoint) that it might not be zero,
- Roko 19 Oct 2024 17:20 UTC
  2 points
  0
  Parent
  
  his AI girlfriend told him to
  
  Which AI told him this? What exactly did it say? Had it undergone RLHF for ethics/harmlessness?
  - Michael Roe 19 Oct 2024 17:55 UTC
    1 point
    0
    Parent
    Replika, I think.
    - Michael Roe 19 Oct 2024 18:00 UTC
      9 points
      0
      Parent
      https://www.bbc.co.uk/news/technology-67012224
      - Roko 20 Oct 2024 16:36 UTC
        2 points
        0
        Parent
        ok so from the looks of that it basically just went along with a fantasy he already had. But this is an interesting case and an example of the kind of thing I am looking for.
- Michael Roe 19 Oct 2024 14:33 UTC
  1 point
  0
  Parent
  “self-reported data from demons is questionable for at least two reasons”—Scott Alexander.
  He was actually talking about Internal Family Systems, but you could probably be skeptical about what malign AIs are telling you, too.