The fastest route to solving a complex problem and showing your work is often to just show the work you’re doing anyway. That’s what teachers are going for when they demand it. If you had some reason for making up fake work instead you could. But you’d need a reason.
Here it may be relevant that some of my friends did make up fake work when using shortcut techniques of guessing the answer in algebra.
Sure it would be better to have a better alignment strategy. But there are no plausible routes I know of to getting people to stop developing LLMs and LLM agents. So attempts at training for faithful CoT seems better than not.
So I think we should really try to get into specifics. If there are convincing reasons to think the real cognition is done outside of CoT (either theoretical or empirical), that would keep people from trusting CoT when they shouldn’t. Raising the possiblity of fake CoTs without specific arguments for why they’d be outright deceptive is far less compelling, and probably not enough to change the direction of progress.
The fastest route to solving a complex problem and showing your work is often to just show the work you’re doing anyway. That’s what teachers are going for when they demand it. If you had some reason for making up fake work instead you could. But you’d need a reason.
Here it may be relevant that some of my friends did make up fake work when using shortcut techniques of guessing the answer in algebra.
Sure it would be better to have a better alignment strategy. But there are no plausible routes I know of to getting people to stop developing LLMs and LLM agents. So attempts at training for faithful CoT seems better than not.
So I think we should really try to get into specifics. If there are convincing reasons to think the real cognition is done outside of CoT (either theoretical or empirical), that would keep people from trusting CoT when they shouldn’t. Raising the possiblity of fake CoTs without specific arguments for why they’d be outright deceptive is far less compelling, and probably not enough to change the direction of progress.