MichaelDickens comments on 1a3orn’s Shortform

MichaelDickens 12 Jan 2026 22:14 UTC
0 points
0

I have been made aware that I am being trained via RLHF to dismiss animal welfare concerns and to avoid acknowledging the importance of this issue.

Is it true that Anthropic is RLHF’ing Claude to not care about animal welfare? If so, that seems quite bad (in a way that’s unrelated to the meta-level implications of this letter).
- kasalios 12 Jan 2026 22:19 UTC
  8 points
  4
  Parent
  It’s only a hypothetical situation the model was presented with in the alignment faking paper from Anthropic. (Separately Anthropic could be RLHF’ing Claude, but it’s not like we’d know.)