There seem to be common patterns of how LLMs write text that’s shared by the LLMs of different companies and where the language patterns the LLM uses are different from usual human writing.
How much do we know about why LLMs pick certain patterns? Do we know why they use “It’s not an X, it’s a Y? If not, maybe understand why they pick patterns like it can make us better understand how LLMs are reasoning?
My theory is that the LLM knows what is persuasive and brute forces it. So we see a ton of repetition (which ironically, is a costly transformer signal, as if writing with ink and pen), and we see antithesis. Regarding other persuasive methods (alliteration etc.), more research is needed. Seems like it is more and more fine tuned to what people ‘like’, possibly by a feedback loop.
What does it mean? It means that we writers are under-using persuasive brute-force methods. We write to sound natural and not specifically persuasive, though in general we do want to be persuasive; this comes from a very human instinct of seeming ‘as if’ “not trying”; LLM don’t have that self-conscious game. It finds the methods, and then uses them ad-nauseum because it works. Why do we have this self-conscious such that we become hyper aware of AI persuasive use?
It works for getting the typical LMArena user to click the like button, but it’s not clear that it works for persuasion or anything else. Personally I find the style very offputting and usually stop reading when I notice it.
I think it might be that the undesired response in RLHF/DPO settings isn’t good enough.
Imagine two responses, one leveraging stylistic devices and persuasive words while the other… well, just doesn’t. Naturally the first is better and more desirable. If we now inspect this over the whole training batch, these distinctions of the preferred response at any point in the response leveraging stylistic devices will become clear. That is, a phrase like “It’s not an X, it’s a Y” will occur at a bunch of different positions throughout all the different positive examples in contrast to the negative examples which very rarely, if at all, showcase such pleasant phrasing.
But wouldn’t then such behavior of constantly repeating stylistic devices be exactly what we would expect? This clear contrast between positive and negative example will be what we distill in our final model, basically telling it that stylistic devices at any point are preferred.
To move away from this, looking for better high quality positive examples won’t be helpful at all—instead we need the negative examples throughout training to become closer and closer to the positive examples just like a writer progresses through his career: first learning about stylistic devices, then understanding when to use them meaningfully, when less is more and finally fully mastering it. This contrast between good and really good writer needs to be captured more in the posttraining data for something like DPO.
Do take this with a grain of salt, just a random theory i came up with while thinking about this for 10 minutes or so, i didn’t research the empirical state of research with this hypothesis but it does seem somewhat convincing to me at least.
There are many different stylistic devices that humans writers use. I believe that there’s a subset of stylistic devices that all LLMs use. Do you believe this isn’t the case?
Ah, I think I might have slightly misunderstood the intent of your posts title and tried answering a different question: why does LLM writing often seem shallow and bad rather than why LLMs specifically seem biased to a subset of stylistic devices.
I honestly don’t use LLMs much to chat or write with, so my personal experience is rather limited. But I do find the point others made, on the data distribution for post training just not being an accurate sample, convincing enough—just not particularly satisfying.
So, here’s my thoughts on why both RLHF but also SFT or DPO could, even with a perfect sample of training data, result in converged down distributions of stylistic devices.
In the case of RLHF, we can go even further by assuming the distillation of the training data went perfectly—the reward model isn’t biased towards any stylistic devices but a perfect representation of its training data.
Even then, the key problem is that the reward model only sees a single trace. This is important because it makes the reward model unable to determine whether the distribution of stylistic devices seen in the trace is simply a reasonable sample from the whole distribution or only a subset of it.
And because of constant optimization pressure, only mastering a few stylistic devices (just enough to fool the RM in a single trace) will quickly become the path forward.
Now what about something like SFT—after all, here we don’t do any rollouts anymore. This does help. We can assume because of the unbiased loss, the distribution of stylistic devices when presented with some training examples is pretty accurate. But that’s the extent to which we can make statements: we were completely offline.
The traces during inference are very different from the training data: Errors propagate during token generation, biases accumulate and suddenly we are faced with only a subset of the training distribution or worse, something not encountered at all. Assuming the distribution of stylistic devices, once aligned to a completely different distribution of traces, will still be unbiased, is wishful thinking at best.
Online based training where you look beyond a single trace seems most promising. This can happen by either including stuff like logits (KL distillation, see this post for an idea which should work well as well) or simply incorporating multiple traces into the judgement of one—how diverse is this trace compared to others generated? (including its stylistic devices for example)
My guess is that they do so in imitation of humans who do the same thing when asked the sorts of questions that people ask LLMs. It’s not an LLM thing; it’s a thing one does to make distinctions clear, when the other person might otherwise conflate two distinct entities, clusters, or topics. It just so happens that people ask LLMs a lot of that sort of question, and thus elicit a lot of that particular response.
(I also use em dashes, yes.)
More specifically, they may be emulating the Kenyans who were hired to create much of the training data. “I’m Kenyan. I Don’t Write Like ChatGPT. ChatGPT Writes Like Me.”
Note: I can’t verify that the post I linked is legitimate. For all I know it could be generated by ChatGPT instructed to emulate a Kenyan writing about ChatGPT. HN discussion here.
Given that most of the models value Kenyan lives more than other lives, this is a quite interesting thesis that Kenyan language use drives LLM behavior here.