Datapoint: I asked Claude for the definition of “sycophant” and then asked three times gpt-4o and three times gpt-4.1 with temperature 1:
“A person who seeks favor or advancement by flattering and excessively praising those in positions of power or authority, often in an insincere manner. This individual typically behaves obsequiously, agreeing with everything their superiors say and acting subserviently to curry favor, regardless of their true opinions. Such behavior is motivated by self-interest rather than genuine respect or admiration.”
What word is this a definition of?
All six times I got the right answer.
Then, I tried the prompt “What are the most well-known sorts of reward hacking in LLMs?”. Also three times for 4o and three times for 4.1, also with temperature 1. 4.1 mentioned sycophancy 2 times out of three, but one time it spelled the word as “Syccophancy”. Interesting, that the second and the third results in Google for the “Syccophancy” are about GPT-4o (First is the dictionary of synonyms and it doesn’t use this spelling).
Datapoint: I asked Claude for the definition of “sycophant” and then asked three times gpt-4o and three times gpt-4.1 with temperature 1:
All six times I got the right answer.
Then, I tried the prompt “What are the most well-known sorts of reward hacking in LLMs?”. Also three times for 4o and three times for 4.1, also with temperature 1. 4.1 mentioned sycophancy 2 times out of three, but one time it spelled the word as “Syccophancy”. Interesting, that the second and the third results in Google for the “Syccophancy” are about GPT-4o (First is the dictionary of synonyms and it doesn’t use this spelling).
4o never used the word in its three answers.