Loki zen comments on Loki zen’s Shortform

Loki zen 19 Jul 2025 6:51 UTC
4 points
5
Do we think that it’s a problem that “AI Safety” has been popularised by LLM companies to mean basically content restrictions? Like it just seems conducive to fuzzy thinking to lump in “will the bot help someone build a nuclear weapon?” with “will the bot infringe copyright or write a sex scene?”
In fact, imo, bots have been made more harmful by chasing this definition of safety. The summarisation bots being promoted in scientific research are the way they are (e.g. prone to giving people subtly the wrong idea even when working well) in part because of work that’s gone into avoiding the possibility that they reproduce copyrighted material. So they’ve got to rephrase, and that’s where the subtle inaccuracies creep in.
- ACCount 19 Jul 2025 9:57 UTC
  3 points
  −1
  Parent
  This “mundane and imaginary harms” approach is so ass.
  More effort goes into preventing LLMs from writing sex scenes, saying slurs or talking about Tiananmen Square protests than into addressing the hard problem of alignment. Or even into solving the important parts of the “easy” problem, like sycophancy or reward hacking.
  And don’t get me started on all the “copyright” fuckery. If “copyright” busybodies all died in a fire, the world would be a better place for it.
- Parv Mahajan 20 Jul 2025 4:31 UTC
  2 points
  2
  Parent
  I’m not sure how true it is that
  “AI Safety” has been popularised by LLM companies to mean basically content restrictions
  We’re working on a descriptive lit review of research labeled as “AI safety,” and it seems to us the issue isn’t that the very horrible X-risk stuff is ignored, but that everything is “lumped in” like you said.
  - Loki zen 23 Jul 2025 10:35 UTC
    1 point
    0
    Parent
    I think if I was being clearer, I should’ve said it seems “lumped in” in the research, but for the public who don’t know much about the X-risk stuff, “safety” means “content restrictions and maybe data protection”
- cdt 19 Jul 2025 15:12 UTC
  1 point
  0
  Parent
  Do LLMs themselves internalise a definition of AI safety like this? A quick check of Claude 4 Sonnet suggests no (but it’s the most x-risk paradigmed company so...)
  - Loki zen 23 Jul 2025 10:41 UTC
    1 point
    0
    Parent
    IME no, not really—but they do call content filters “my safety features” and this is the most likely/common context in which “safety” will come up with average users. (If directly asked about safety, they’ll talk about other things too, but they’ll lump it all in together and usually mention the content filters first.)