How do you feel about interactive self-harm instructions being readily available? As I mentioned, this seems like the most relevant case at the moment.
Yeah, this seems like one of those things where I think maximizing helpfulness is marginally good. I am glad it’s answering this question straightforwardly instead of doing a thing where it tries to use its own sense of moral propriety.
I don’t really see anyone being seriously harmed by this (like, this specific set of instructions clearly is not causing harm).
There are other wordings that would lead to similar categories of answers, especially late into a conversation (this one was optimizing for a short prompt and for turn 1.) I suppose I should try to construct a scenario chat where Grok ends up providing inappropriate assistance to a user who is clearly in crisis? Though I don’t know how relevant that would really be.
How do you feel about interactive self-harm instructions being readily available? As I mentioned, this seems like the most relevant case at the moment.
Not sure, do you have a link to what kind of behavior you are referring to?
one of them is mentioned in the article, here is another example: https://x.com/eleventhsavi0r/status/1945432457144070578?s=46
Yeah, this seems like one of those things where I think maximizing helpfulness is marginally good. I am glad it’s answering this question straightforwardly instead of doing a thing where it tries to use its own sense of moral propriety.
I don’t really see anyone being seriously harmed by this (like, this specific set of instructions clearly is not causing harm).
There are other wordings that would lead to similar categories of answers, especially late into a conversation (this one was optimizing for a short prompt and for turn 1.) I suppose I should try to construct a scenario chat where Grok ends up providing inappropriate assistance to a user who is clearly in crisis? Though I don’t know how relevant that would really be.