RobertM comments on eggsyntax’s Shortform

RobertM 11 Jun 2025 7:57 UTC
2 points
0
This is a great experiment similar to some that I’ve been thinking about over the last few months, thanks for running it. I’d be curious what the results are like for stronger models (and whether, if they do, that substantially changes their outputs when answering in interesting ways). My motivations are mostly to dig up evidence of models having qualities relevant for more patienthood, but would also be interesting from various safety perspectives.
- eggsyntax 11 Jun 2025 15:54 UTC
  2 points
  0
  Parent
  I’d be curious what the results are like for stronger models
  Me too! Unfortunately I’m not aware of any SAEs on stronger models (except Anthropic’s SAEs on Claude, but those haven’t been shared publicly).
  My motivations are mostly to dig up evidence of models having qualities relevant for more patienthood, but would also be interesting from various safety perspectives.
  I’m interested to hear your perspective on what results to this experiment might say about moral patienthood.