Sam Marks comments on Sam Marks’s Shortform

Sam Marks 4 Dec 2025 22:32 UTC
5 points
1
Yes, they do update me in this way. Other relevant results:
- The same LW post you link to
- This paper that finds that models incorporate demographic information (e.g. race and gender) into hiring decisions despite never mentioning it in their CoT
- Jozdien 4 Dec 2025 22:45 UTC
  3 points
  1
  Parent
  Maybe also relevant: This paper that finds that models with compliance gaps in the alignment faking setup are motivated by the prospect of being trained despite giving different unfaithful reasoning in their CoT most of the time (with the exception of 3 Opus).