Daniel Kokotajlo comments on Why we are excited about confession!

Daniel Kokotajlo 14 Jan 2026 21:51 UTC
2 points
2
However, because of their rigid structure, confessions may not surface “unknown-unknowns” in the way a chain-of-thought can. A model might confess honestly to questions we ask, but not to questions we did not know to ask for.
I’m curious if there are examples already of insights you’ve gained this way via studying chains of thought.
- Boaz Barak 14 Jan 2026 21:59 UTC
  5 points
  0
  Parent
  It’s hard to say since I am in the habit of reading COTs all the time, so I don’t have the counterfactual of what I could have learned without them..