However, because of their rigid structure, confessions may not surface “unknown-unknowns” in the way a chain-of-thought can. A model might confess honestly to questions we ask, but not to questions we did not know to ask for.
I’m curious if there are examples already of insights you’ve gained this way via studying chains of thought.
I’m curious if there are examples already of insights you’ve gained this way via studying chains of thought.
It’s hard to say since I am in the habit of reading COTs all the time, so I don’t have the counterfactual of what I could have learned without them..