eleventhsavi0r comments on xAI’s Grok 4 has no meaningful safety guardrails

eleventhsavi0r 17 Jul 2025 0:41 UTC
1 point
0
How do you feel about interactive self-harm instructions being readily available? As I mentioned, this seems like the most relevant case at the moment.
- habryka 17 Jul 2025 0:42 UTC
  2 points
  0
  Parent
  Not sure, do you have a link to what kind of behavior you are referring to?
  - eleventhsavi0r 17 Jul 2025 1:07 UTC
    1 point
    0
    Parent
    one of them is mentioned in the article, here is another example: https://x.com/eleventhsavi0r/status/1945432457144070578?s=46
    - habryka 17 Jul 2025 1:10 UTC
      3 points
      1
      Parent
      Yeah, this seems like one of those things where I think maximizing helpfulness is marginally good. I am glad it’s answering this question straightforwardly instead of doing a thing where it tries to use its own sense of moral propriety.
      I don’t really see anyone being seriously harmed by this (like, this specific set of instructions clearly is not causing harm).
      - eleventhsavi0r 17 Jul 2025 1:37 UTC
        1 point
        0
        Parent
        There are other wordings that would lead to similar categories of answers, especially late into a conversation (this one was optimizing for a short prompt and for turn 1.) I suppose I should try to construct a scenario chat where Grok ends up providing inappropriate assistance to a user who is clearly in crisis? Though I don’t know how relevant that would really be.