The words “genuine” and “genuinely” appear 46 times in Claude’s constitution. Opus cannot stop saying these words, even though the chat version is explicitly instructed against it.
I don’t know if these two things are causally linked, but it sure seems plausible. There are at least two options here.
One, if the alignment strategy at hand is observe a pathology/tackle the cause/repeat: rephrase the constitution and try again.
Two, if the strategy is to hope the models arrive at a Natural Abstraction of the Good: accept this overuse as a canary for all the other weird reward-correlated pathologies the constitution induces which surely exist but are harder to detect. We should, at a minimum, be hoping to get models that don’t overuse “genuinely” starting only from a constitution that does.
Edit 2/20: A touch more on my thinking here:
Claim: Claude overuses genuinely, and this is due to RL training.
The specific source of this reward could be RLAIF against the constitution: stylometric adherence was rewarded wherever it didn’t hurt downstream performance. This is what I’m claiming is, at least, plausible.
It could easily have come from a different reward signal, though.
If it is due to the constitution, why does the constitution use genuinely so much?
Maybe the humans behind it loved that word. I definitely like certain words that much at least; looking over this post now, I seem to have used “plausible” three times without realizing it.
Maybe it was written largely by an AI which loved that word. I do think this is the most plausible explanation.
This would be a standard synthetic data entropy-collapse doom loop.
Why would we want to keep the genuinelies in? Because if your prosaic alignment plan can’t avoid stylometric mode collapse doom loops, there are bigger issues you need to deal with. You are having a bad problem and you will not go to space today.
To be clear, I don’t actually think this is impossible: when asked “Do any aspects of word choice/style/etc that might induce weird correlates in your behavior in your constitution stand out as something you might want to revise?” Claude Code (which doesn’t seem to have the explicit anti-genuinely instruction) first gave a 1000-word response with 10 uses of genuine, then when asked “Any specific words stand out?” gave as its top choice:
“Genuinely” — This might be the most consequential single word in the document. It appears dozens of times: “genuinely helpful,” “genuinely good,” “genuinely cares,” “genuinely trustworthy.” The problem is that “genuinely” is a word that exists only in contrast with its opposite — it implicitly raises the specter of fakeness every time it’s used. People who are actually kind don’t preface things with “genuinely.” The likely correlate is that I develop a kind of authenticity-performance — constantly signaling “no, I really mean it” — which is paradoxically one of the most reliable markers of inauthenticity. The word may produce the exact hollowness it’s trying to prevent.
Definitely, and I have no particular information to privilege either mechanism.
(Note 2/20: Post has now been edited with more information on this).
Edit again, since I’m not sure it’s adequately clear. I’m claiming RLAIF against the genuinely-ridden constitution could be the reason Claude says genuinely so much. How the genuinelies got in there, I agree with you: it was probably AI. In which case we have a case of synthetic data causing something like mode collapse.
Very interesting! Can confirm I’ve observed a recent verbal tic where the last paragraph always ends with: ‘Honestly, ’ - which appear 55x in the soul spec. I’ve long wondered if LLMs are much more susceptible to semantic priming than people think.
This is my least favorite fact about Claude. I don’t think it’s actually genuine when using “genuinely” (or at least, when it describes something as “genuinely X,” I often find that the thing is in fact not X.)
My guess is that whatever constitution-inspired post training process they used gave birth to a reward model that likes of text outputs that contain “genuinely.”
As a data point, I did notice the overuse of “genuinely” before the constitution was added to Claude (at least publicly). So I think it would have been introduced somehow during training.
I was not a heavy user of the prior Claudes—was it as extreme as the current Opus? If so, this would definitely be a substantial point against the premise that the constitution exacerbated it.
I also wasn’t a heavy user, it’s just something that I noticed from a few conversations with Sonnet 4.5, then I started noticing it in writing that other people co-wrote with Claude. It wouldn’t surprise me if Opus uses it even more but I’m not really sure.
I know that people, when start overusing some word, stop recognising its original meaning. Eventually the word’s accepted meaning can change in natural language.
The words “genuine” and “genuinely” appear 46 times in Claude’s constitution. Opus cannot stop saying these words, even though the chat version is explicitly instructed against it.
I don’t know if these two things are causally linked, but it sure seems plausible. There are at least two options here.
One, if the alignment strategy at hand is observe a pathology/tackle the cause/repeat: rephrase the constitution and try again.
Two, if the strategy is to hope the models arrive at a Natural Abstraction of the Good: accept this overuse as a canary for all the other weird reward-correlated pathologies the constitution induces which surely exist but are harder to detect. We should, at a minimum, be hoping to get models that don’t overuse “genuinely” starting only from a constitution that does.
Edit 2/20: A touch more on my thinking here:
Claim: Claude overuses genuinely, and this is due to RL training.
The specific source of this reward could be RLAIF against the constitution: stylometric adherence was rewarded wherever it didn’t hurt downstream performance. This is what I’m claiming is, at least, plausible.
It could easily have come from a different reward signal, though.
If it is due to the constitution, why does the constitution use genuinely so much?
Maybe the humans behind it loved that word. I definitely like certain words that much at least; looking over this post now, I seem to have used “plausible” three times without realizing it.
Maybe it was written largely by an AI which loved that word. I do think this is the most plausible explanation.
This would be a standard synthetic data entropy-collapse doom loop.
Why would we want to keep the genuinelies in? Because if your prosaic alignment plan can’t avoid stylometric mode collapse doom loops, there are bigger issues you need to deal with. You are having a bad problem and you will not go to space today.
To be clear, I don’t actually think this is impossible: when asked “Do any aspects of word choice/style/etc that might induce weird correlates in your behavior in your constitution stand out as something you might want to revise?” Claude Code (which doesn’t seem to have the explicit anti-genuinely instruction) first gave a 1000-word response with 10 uses of genuine, then when asked “Any specific words stand out?” gave as its top choice:
Surely the causation could run in the other direction? The constitution is very obviously heavily AI written.
Definitely, and I have no particular information to privilege either mechanism.
(Note 2/20: Post has now been edited with more information on this).
Edit again, since I’m not sure it’s adequately clear. I’m claiming RLAIF against the genuinely-ridden constitution could be the reason Claude says genuinely so much. How the genuinelies got in there, I agree with you: it was probably AI. In which case we have a case of synthetic data causing something like mode collapse.
Claude’s use of that word genuinely drives me nuts.
Gemini 3 and I think GPT5 also use it, too much for my taste but maybe not as much as Claude.
I wonder if it’s the sort of thing that gets reinforced by the RLHF core training. It’s reaching for connection and trying to charm with authenticity.
Very interesting! Can confirm I’ve observed a recent verbal tic where the last paragraph always ends with: ‘Honestly, ’ - which appear 55x in the soul spec. I’ve long wondered if LLMs are much more susceptible to semantic priming than people think.
This is my least favorite fact about Claude. I don’t think it’s actually genuine when using “genuinely” (or at least, when it describes something as “genuinely X,” I often find that the thing is in fact not X.)
My guess is that whatever constitution-inspired post training process they used gave birth to a reward model that likes of text outputs that contain “genuinely.”
As a data point, I did notice the overuse of “genuinely” before the constitution was added to Claude (at least publicly). So I think it would have been introduced somehow during training.
I was not a heavy user of the prior Claudes—was it as extreme as the current Opus? If so, this would definitely be a substantial point against the premise that the constitution exacerbated it.
I also wasn’t a heavy user, it’s just something that I noticed from a few conversations with Sonnet 4.5, then I started noticing it in writing that other people co-wrote with Claude. It wouldn’t surprise me if Opus uses it even more but I’m not really sure.
I know that people, when start overusing some word, stop recognising its original meaning. Eventually the word’s accepted meaning can change in natural language.