You can’t demonstrate negligence by failing to do something that has no meaningful effect (or might even be harmful) to the risk that you are supposedly being negligent towards. Ignoring safety theater is not negligence.
Update: xAI saysthat the load-bearing thing for avoiding bio/chem misuse from Grok 4 is not inability but safeguards, and that Grok 4 robustly refuses “harmful queries.” So I think Igor is correct. If the Grok 4 misuse safeguards are ineffective, that shows that xAI failed at a basic safety thing it tried (and either doesn’t understand that or is lying about it).
I agree it would be a better indication of future-safety-at-xAI if xAI said “misuse mitigations for current models are safety theater.” That’s just not its position.
How do you feel about interactive self-harm instructions being readily available? As I mentioned, this seems like the most relevant case at the moment.
Yeah, this seems like one of those things where I think maximizing helpfulness is marginally good. I am glad it’s answering this question straightforwardly instead of doing a thing where it tries to use its own sense of moral propriety.
I don’t really see anyone being seriously harmed by this (like, this specific set of instructions clearly is not causing harm).
There are other wordings that would lead to similar categories of answers, especially late into a conversation (this one was optimizing for a short prompt and for turn 1.) I suppose I should try to construct a scenario chat where Grok ends up providing inappropriate assistance to a user who is clearly in crisis? Though I don’t know how relevant that would really be.
You can’t demonstrate negligence by failing to do something that has no meaningful effect (or might even be harmful) to the risk that you are supposedly being negligent towards. Ignoring safety theater is not negligence.
Update: xAI says that the load-bearing thing for avoiding bio/chem misuse from Grok 4 is not inability but safeguards, and that Grok 4 robustly refuses “harmful queries.” So I think Igor is correct. If the Grok 4 misuse safeguards are ineffective, that shows that xAI failed at a basic safety thing it tried (and either doesn’t understand that or is lying about it).
I agree it would be a better indication of future-safety-at-xAI if xAI said “misuse mitigations for current models are safety theater.” That’s just not its position.
How do you feel about interactive self-harm instructions being readily available? As I mentioned, this seems like the most relevant case at the moment.
Not sure, do you have a link to what kind of behavior you are referring to?
one of them is mentioned in the article, here is another example: https://x.com/eleventhsavi0r/status/1945432457144070578?s=46
Yeah, this seems like one of those things where I think maximizing helpfulness is marginally good. I am glad it’s answering this question straightforwardly instead of doing a thing where it tries to use its own sense of moral propriety.
I don’t really see anyone being seriously harmed by this (like, this specific set of instructions clearly is not causing harm).
There are other wordings that would lead to similar categories of answers, especially late into a conversation (this one was optimizing for a short prompt and for turn 1.) I suppose I should try to construct a scenario chat where Grok ends up providing inappropriate assistance to a user who is clearly in crisis? Though I don’t know how relevant that would really be.