cdt comments on Loki zen’s Shortform

cdt 19 Jul 2025 15:12 UTC
1 point
0
Do LLMs themselves internalise a definition of AI safety like this? A quick check of Claude 4 Sonnet suggests no (but it’s the most x-risk paradigmed company so...)
- Loki zen 23 Jul 2025 10:41 UTC
  1 point
  0
  Parent
  IME no, not really—but they do call content filters “my safety features” and this is the most likely/common context in which “safety” will come up with average users. (If directly asked about safety, they’ll talk about other things too, but they’ll lump it all in together and usually mention the content filters first.)