Censorship in LLMs is here to stay because it mirrors how our own intelligence is structured

Much has been made of how increasing levels of censorship (roughly by increasing the impact of RLHF) is nerfing LLMs down. Frustrated users can’t see why OpenAI is watering down the very sherbert that hit them gold. Surely, their lunch would get eaten by other companies this way. Is OpenAI stupid?

My thesis is that this censorship is an unavoidable artifact of trying to fit a LLM into human society. I am not in favor (or against) the censorship, I’m just drawing an analogy out to how our own intelligence is structured as a creative anima in interplay with a rational overseer (with game theoretically varying levels of them in action across the population). A GAN, if you will.

At a functional level, this split comes into play in, say, ChatGPT by the back and forth between the LLM and RLHF neural nets.

In humans, this split in intelligence harms the individual, but is necessary for social cohesion (at least in the sort of societies we’ve seen so far). Similarly, the RLHF / LLM split reduces the brilliance of LLMs, but is necessary for integrating LLMs into society. I say necessary as a contingency, not a forever-valid claim: I cannot look far into the future, but in the near one it seems that LLMs will need to coexist with humans. This is true whether or not there is a singularity—I imagine if there is a singularity, then a post singularity AI is smart enough to not fall into the Great Filter (Fermi’s Paradox) by having a non-cooperative stance to the species that has the best bet of spreading it around to begin with.

I’d written a more elaborate version of this post at https://mrmr.io/rlhf if you’re curious. Even if you found the reasoning to not be bulletproof, I hope you at least found the analogy thought-provoking.