eggsyntax comments on GPT-4o Responds to Negative Feedback

eggsyntax 1 May 2025 16:20 UTC
4 points
2
Here’s enablerGPT watching to see how far GPT-4o will take its support for a crazy person going crazy in a dangerous situation. The answer is, remarkably far, with no limits in sight. Here’s Colin Fraser playing the role of someone having a psychotic episode. GPT-4o handles it extremely badly. It wouldn’t shock me if there were lawsuits over this. Here’s one involving the hypothetical mistreatment of a woman.
These are pretty horrifying, especially that last one. It’s an indictment of OpenAI that they put out a model that would do this.
At the same time I think there’s a real risk that this sort of material trips too many Something Must Be Done flags, companies lose some lawsuits, and we lose a lot of mundane utility as the scaling labs make changes like, for example, forbidding the models from saying anything that could be construed as advice. Or worse, we end up with laws forbidding that.
A couple of possible intuition pumps:
- There are published books that have awful content, including advocacy for crazy ideas and advice on manipulating people, and I’m very happy that there’s not an agency reading every book in advance of publication and shutting down the ones that they think give bad advice.
- There are plenty of tools in every hardware store that can cause lots of damage if mishandled. You can use an ordinary power drill to drill through your own skull (note: if you do this it’s very important to use the right bits), but I’m really glad that I can buy them anyway.
I think that LLMs should similarly be treated as having implicit ‘caveat emptor’ stickers (and in fact they often have explicit stickers, eg ‘LLMs are experimental technology and may hallucinate or give wrong answers’). So far society has mostly accepted that, muck-raking journalists aside, and I’d hate to see that change.
- eggsyntax 1 May 2025 16:23 UTC
  2 points
  0
  Parent
  Of course there will come a time when those tradeoffs may have to shift, eg if and when models become superhumanly persuasive and/or more goal directed. But let’s not throw away our ability to have nice things until we have to.