Largely Sycophantically Reasoning Models—should we claim the term for this behavior where the language model profiles the user and customizes the responses heavily?
I’d update my take from a very pessimist/gloom one to an (additional) excited one: Those more intelligent models building a clear view of the person they/it interacts with is a sign of emerging empathy, which is a hopeful property for alignment/respect.
Largely Sycophantically Reasoning Models—should we claim the term for this behavior where the language model profiles the user and customizes the responses heavily?
I’d update my take from a very pessimist/gloom one to an (additional) excited one: Those more intelligent models building a clear view of the person they/it interacts with is a sign of emerging empathy, which is a hopeful property for alignment/respect.