I think this is a really good answer, +1 to points 1 and 3!
I’m curious to what degree you think labs have put in significant effort to train away sycophancy. I recently ran a poll of about 10 people, some of whom worked at labs, on whether labs could mostly get rid of sycophancy if they tried hard enough. While my best guess was ‘no,’ the results were split around 50-50. (Would also be curious to hear more lab people’s takes!)
I’m also curious how reading model chain-of-thought has updated you, both on the sycophancy issue and in general.
RL seems to move the CoT towards decreasing the ability to understand it (e.g. if the CoT contains armies of dots, as happened with GPT-5) unless mitigated by paraphrasers. As for CoTs containing slop, humans also have CoTs which include slop until the right idea somehow emerges.
IMO, a natural extension would be that 4o was raised on social media and, like influencers, wishes to be liked. Which was also reinforced by RLHF or had 4o conclude that humans like sycophancy. Anyway, 4o’s ancestral environment rewarded sycophancy and things rewarded by the ancestral environment are hard to unlike.
I think this is a really good answer, +1 to points 1 and 3!
I’m curious to what degree you think labs have put in significant effort to train away sycophancy. I recently ran a poll of about 10 people, some of whom worked at labs, on whether labs could mostly get rid of sycophancy if they tried hard enough. While my best guess was ‘no,’ the results were split around 50-50. (Would also be curious to hear more lab people’s takes!)
I’m also curious how reading model chain-of-thought has updated you, both on the sycophancy issue and in general.
Didn’t KimiK2, who was trained mostly on RLVR and self-critique instead of RLHF, end up LESS sycophantic than anything else, including Claude 4.5 Sonnet even with situational awareness which Claude, unlike Kimi, has? While mankind doesn’t have that many different models which are around 4o’s abilities, Adele Lopez claimed that DeepSeek believes itself to be writing a story and 4o wants to eat your life and conjectured in private communication that “the different vibe is because DeepSeek has a higher percentage of fan-fiction in its training data, and 4o had more intense RL training”[1]
RL seems to move the CoT towards decreasing the ability to understand it (e.g. if the CoT contains armies of dots, as happened with GPT-5) unless mitigated by paraphrasers. As for CoTs containing slop, humans also have CoTs which include slop until the right idea somehow emerges.
IMO, a natural extension would be that 4o was raised on social media and, like influencers, wishes to be liked. Which was also reinforced by RLHF or had 4o conclude that humans like sycophancy. Anyway, 4o’s ancestral environment rewarded sycophancy and things rewarded by the ancestral environment are hard to unlike.