Christina Stejskalova comments on come work on dangerous capability mitigations at Anthropic

Christina Stejskalova 16 Oct 2025 19:23 UTC
1 point
0
Thank you for sharing the resource and the usage policies. It’s a really fascinating and difficult problem, especially in those grey areas where intent is ambiguous to solve. I wonder if there’s a path to bring human experts (like psychologists or ethicists) directly into the loop when there’s risk involved? Aka, rather than train the model with their expert opinions, have a direct line to them whenever potentially dangerous content is flagged.
At the same time, maybe the harder question is whether these conversations should happen on the platform at all, not because models can’t handle them, but because the potential harm might outweigh the benefit.
- Dave Orr 23 Oct 2025 4:28 UTC
  2 points
  0
  Parent
  I think this is a good idea in certain context. Right now we have a weak version of this where we give people a number to call if they seem like they might be showing signs of being suicidal. Stay tuned for more ideas in this direction later this year.