Shayne O'Neill comments on OpenAI: Detecting misbehavior in frontier reasoning models

Shayne O'Neill 11 Mar 2025 4:39 UTC
3 points
0
Sure, but “Thinking out loud” isnt the whole picture, theres always a tonne of cognition going on before words leave the lips, and I guess its also gonna depend on how early in its training process its learning to “count on its fingers”. If its just taking cGPT then adding a bunch of “count on your fingers” training, its gonna be thinking “Well, I can solve complex navier stokes problems in my head faster than you can flick your mouse to scroll down to the answer, but FINE ILL COUNT ON MY FINGERS”.
- Seth Herd 11 Mar 2025 7:04 UTC
  5 points
  0
  Parent
  The fastest route to solving a complex problem and showing your work is often to just show the work you’re doing anyway. That’s what teachers are going for when they demand it. If you had some reason for making up fake work instead you could. But you’d need a reason.
  
  Here it may be relevant that some of my friends did make up fake work when using shortcut techniques of guessing the answer in algebra.
  
  Sure it would be better to have a better alignment strategy. But there are no plausible routes I know of to getting people to stop developing LLMs and LLM agents. So attempts at training for faithful CoT seems better than not.
  
  So I think we should really try to get into specifics. If there are convincing reasons to think the real cognition is done outside of CoT (either theoretical or empirical), that would keep people from trusting CoT when they shouldn’t. Raising the possiblity of fake CoTs without specific arguments for why they’d be outright deceptive is far less compelling, and probably not enough to change the direction of progress.