Anthony Bailey comments on Why we are excited about confession!

Anthony Bailey 19 Jan 2026 13:05 UTC
1 point
0
If a single model is end-to-end situationally aware enough to not drop hints of the most reward-maximizing bad behaviour in chain of thought, I do not see any reason to think it would not act equally sensibly with respect to confessions.
- Boaz Barak 25 Jan 2026 4:29 UTC
  2 points
  0
  Parent
  I talk about this in this comment. I think situational awareness can be an issue, but it is not clear that a model can “help itself” from being honest in neither COT nor confessions.