I think this is a really good answer, +1 to points 1 and 3!
I’m curious to what degree you think labs have put in significant effort to train away sycophancy. I recently ran a poll of about 10 people, some of whom worked at labs, on whether labs could mostly get rid of sycophancy if they tried hard enough. While my best guess was ‘no,’ the results were split around 50-50. (Would also be curious to hear more lab people’s takes!)
I’m also curious how reading model chain-of-thought has updated you, both on the sycophancy issue and in general.
I think about the canonical Reality Has a Surprising Amount of Detail post a lot when trying to automate tasks with LLMs. In particular, any given task has many granular details, most of which don’t come to mind before making contact with reality oneself. The most common failure mode I run into is failing to specify various details that I hadn’t even *realized* were relevant things that could be messed up, before the model encountered the situation and messed them up. This also seems relevant when thinking about the transition from verifiable math and coding problems to more open-ended tasks like research or robotic manipulation.