One example of how LLM propaganda attacks can hack the brain

Disclaimer: This is not a definitive post describing the problem, it is only describing one facet thoroughly, while leaving out many critical components of the problem. Please do not interpret this as a helpful overview of the problem, the list of dynamics here is extremely far from exhaustive, and this inadequacy should be kept in mind when forming functioning world models. AI policy and AGI macrostrategy is extremely important, and world models should be as complete as possible for people tasked with work in these areas.

Disclaimer: AI propaganda is not an x-risk or an s-risk or useful for technical alignment, and should not be considered as such. It is, however, critical for understanding the gameboard for AI.

The Bandwagon Effect

The human brain is a kludge of spaghetti code, and it therefore follows that there will be exploitable “zero days” within most or all humans. LLM-generated propaganda can hijack the human mind by a number of means; one example is by exploiting the Bandwagon effect, giving substantial control over which ideas appear to be popular at any given moment. Although LLM technology itself has significant intelligence limits, the bandwagon effect also offers a workaround: the bandwagon effect can be used to encircle and win over the minds of more gullible people, who personally write more persuasive rhetoric and take on the task of generating propaganda; this can also chain upwards from less intelligent people to more intelligent people as more intelligent and persuasive wordings are written at every step of the process, with LLMs autonomously learning from humans and synthesizing alternative versions at every step. With AI whose weights are controlled by , instead of ChatGPT’s notorious “deep fried prudishness”, this allows for impressive propaganda synthesis, amplification, and repetition.

Although techniques like twitter’s paid blue checkmarks are a very good mitigation, the problem is not just that AI is used to output propaganda, but that humans are engineered to output propaganda as well via the bandwagon effect, applying human intelligence as an ongoing input for generative AI. The problem is that LLM propaganda is vastly superior at turning individuals into temporary propaganda generators, at any given time and with the constant illusion of human interaction at every step of the process, which makes this vastly more effective than the party slogans, speeches, and censorship of dissenting views, which have dominated information ecosystems for many decades. The blue checkmarks simply reduce the degrees of freedom, by making the blue check mark accounts and accounts following them become a priority.

The sheer power of social media in part comes from how users believe that the platform is a good bellwether for what’s currently popular. In fact, the platform can best reinforce this by actually being a good bellwether for what’s currently popular ~98% of the time, and then the remaining ~2% of the time can actively set what does and does not become popular. Social media botnets, for example, are vastly superior at suppressing a new idea and preventing it from becoming popular, as modern LLMs are capable of strawmanning and criticizing ideas, especially under different lenses such as the progressive lens and the right-wing critical lens (requiring that the generated tweet include the acronym “SMH” is a good example of a prompt hack that can more effectively steer LLMs towards generate realistic negative tweets). Botnets can easily be deployed by any foreign intelligence agency, and the ability of botnets to thwart security systems of social media platforms hinges on the cybersecurity competence of the hackers running the operation, as well as success at implementing AI to pose as human users.

When propaganda is both repetitive, simple, emotional, and appears to be popular, after days, weeks, or months of repeatedly hearing an argument that they have accepted, people start to forget that it was only introduced to them recently and integrate it thoroughly into their thinking as something immemorial. This can happen with true or false arguments. This is likely related to the availability heuristic, where people rely on ideas more frequently depending on how easily they come to mind.

Social media has substantial capability to gradually shift discourse over time, by incrementally, repeatedly, and simultaneously affecting each node in massive networks, ultimately dominating the information environment, even for people who believe themselves to be completely unplugged from social media.

Security Mindset

Social media seems to fit the human mind really well, in ways we don’t fully understand, like the orchid that evolved to attracts bees with a flower that ended up shaped like something that fits the bee targeting instinct for a mating partner (even though worker bees are infertile):


It might even fit the bee’s mind better than an actual bee ever could.

Social media news feeds are generally well-known to put people into a trance-like state, which is often akratic. In addition to the constant stream of content, which mitigates the effect of human preference differences with a constant alternative (scrolling down) that continually keeps people re-engaged the instant they lose interest in something, news feeds also utilize a Skinner Box dynamic. Short-video content like Tiktok and Reels is even more immersive, reportedly incredibly intense (I have never used these, but I remember a similar effect from Vine when I used it ~7 years ago). It makes sense that social networks that fit the human brain like a glove would expand rapidly, such as the emergence of Facebook and Myspace in the 00s, due to the trial-and-error nature of platform development and the startup world where many things are tried and things that fit the human brain well end up being noticed, even if the only way to discover things like the tiktok-shaped hole in the human heart is to continuously try things.

This fundamental dynamic also holds for the use of LLMs to influence people and influence societies; even if there is zero evidence the current generation of chatbots do this, that is only a weak bayesian update, because it is unreasonable to expect people to find obvious [exploit]-shaped holes in the hearts of individual humans this early in the process. This particularly holds true for finding an [exploit]-shaped hole in the heart of a group of humans or an entire society/​civilization.

It’s important to note that this post is nowhere near a definitive overview of the problem. Large swaths have been left out, because I’d prefer not to post about them publicly. However, it goes a long way to help describe a single, safe example of why propaganda is a critical matter that drives government and military interest in AI, which is highly significant for AI policy and AGI macrostrategy.