For me, o3 is currently winning the personality game. I much prefer the table obsession over the list obsession, it seems to be fairly non-sycophantic (I don’t think it ever emitted phatic “you’ve asked a very thoughtful question” praise at me), and its default behavior seems to involve trying hard to actually address your query/fulfil your task, in way more goal-oriented than is usual for LLMs[1].
Claude 4′s behavior, by contrast, is still bog-standard LLM-y “vibing around”. Trying them after o3 felt like a definitive downgrade.
(o3′s “trying hard” tendency sometimes overflows into specification gaming/lying-liar behaviors, which are extremely annoying and tedious. But it still feels much more powerful out-of-the-box for all tasks that aren’t coding, and other LLMs’ sycophancy is just about as annoying.)
For me, o3 is currently winning the personality game. I much prefer the table obsession over the list obsession, it seems to be fairly non-sycophantic (I don’t think it ever emitted phatic “you’ve asked a very thoughtful question” praise at me), and its default behavior seems to involve trying hard to actually address your query/fulfil your task, in way more goal-oriented than is usual for LLMs[1].
Claude 4′s behavior, by contrast, is still bog-standard LLM-y “vibing around”. Trying them after o3 felt like a definitive downgrade.
(o3′s “trying hard” tendency sometimes overflows into specification gaming/lying-liar behaviors, which are extremely annoying and tedious. But it still feels much more powerful out-of-the-box for all tasks that aren’t coding, and other LLMs’ sycophancy is just about as annoying.)
Which is a bit concerning if you look at it from a perspective beyond mundane utility, of course...