AI Safety is Dropping the Ball on Clown Attacks
Epistemic status: High confidence (~>70%) that clown attacks are prevalent, and deliberately weaponized by governments and/or intelligence agencies in particular. Very high confidence (~>90%) that the human brain is highly vulnerable to clown attacks, and that a lack of awareness of clown attacks is a security risk, like using the word “password” as your password, except with control of your own mind at stake rather than control over your computer’s operating system and/or your files. This has been steelmanned; the 10 years ago/10 years from now error bars seem appropriately wide.
These concepts are complicated, and I have done my best to make it as easy as possible for most people in AI safety to understand it, even people without a quant background (e.g. AI governance).
Clown attacks
The core dynamic of clown attacks is that perception of social status affects what thoughts the human brain is and isn’t willing to think. This can be used by intelligence agencies and governments to completely deny access to specific lines of thought. Generally, there’s a lot of ways to socially engineer someone’s world model by taking a target concept and having the wrong people say it at specific times and in specific ways. Clown attacks include having right-wing clowns being the main people who are seen talking about the Snowden revelations, or degrowthers being the main people talking about the possibility that technological advancement/human capabilities will end it’s net-positive trend. These are only specific examples of cost-efficient ways to use a specific circumstance (clowns) to change the way that someone feels about a targeted concept.
With clown attacks, the exploit/zero day is the human tendency to associate specific lines of thought with low social status or low-status people, which will consistently inhibit the human brain from pursuing that targeted line of thought.
Although clown attacks may seem mundane on their own, they are a case study proving that powerful human thought steering technologies have probably already been invented, deployed, and tested at scale by AI companies, and are reasonably likely to end up being weaponized against the entire AI safety community at some point in the next 10 years.
AI safety is dropping the ball on clown attacks, at minimum
AI safety is basically a community of nerds who each chanced upon the engineering problem that the fate of this side of the universe revolves around. Many (~300) have decided to focus exclusively on that engineering problem, which seems like a very reasonable thing to do. However, in order for that to be a worthwhile course of action, the AI safety community must continue to exist without being destroyed or coopted by external adversaries or forces. Continued existence, without terminal failure, is an assumption that is currently unquestioned by virtually everyone in the AI safety community. We largely assume that everything will be ok, with AGI being the only turning point. This is a dangerous would model to have.
The history of religion and cryopreservation have informed us that there is an ambient phenomena of bad consensus around terrible, untrue, and viciously self-destructive beliefs and practices. This is an ambient phenomenon and it is at the core of the AI safety situation. So if billions of people are getting something else really wrong too, in addition to ignoring AI safety, then that does not water down the overriding significance and priority of AI safety.
A large proportion of people, to this day, still think that smarter-than-human AI is merely science fiction; this is the kind of thing that happens when >99% of the money spent paying people to think about the future is spent on science fiction writers instead of researchers, which for AI, was true for all of human civilization for all of history until around a decade or two ago.
My argument here is that powerful human manipulation systems are already very easy to build, with 10-year-old technology, and also very easy for powerful people to deny access to people who are less powerful. However, the situation of the moment is that general purpose cognitive hacks like clown attacks can even deny awareness of this technology to targeted people with a surprisingly high success rate, not just access to the technology.
People like Gary Marcus might try to hijack valuable concepts like “p(doom)” for their own pet issue, such as job automation, but “slow takeoff” on the other hand is something that could transform the world in a wide variety of ways that has practical relevance to the continuity and survival of AI alignment efforts.
It’s important to reiterate that a large proportion of people, to this day, still think that smarter-than-human AI is merely science fiction; this is the kind of thing that happens when >99% of the money spent paying people to think about the future is spent on science fiction writers instead of researchers, which for AI, was true for all of human civilization for all of history until around a decade or two ago. In reality, smarter-than-human AI is the finish line for humanity, and being oriented towards that finish line is one of the best ways to do important or valuable things with your time, instead of unintentionally doing unimportant or less valuable things. It is also a good way to orient yourself towards reality (although orienting yourself towards orienting yourself towards reality competes strongly for that #1 slot, as that also tends to result in ending up oriented towards AI as the finish line for humanity, and thus you are oriented towards your own existence by extension as you are a subset of humanity). No amount of mind control technology can displace smarter-than-human AI as the finish line for humanity, but it can be an incredibly helpful gear in our world models for what to expect from the AI industry and from global affairs relevant to the AI race (e.g. US-China relations). Ultimately, however, influence technology is a near-term problem that has a lot of potential to distract a lot of people from the ultimate and inescapable problem of AI alignment; the only reason I’m writing about it here is because I think that the AI safety community will be much better off making stronger predictions and having stronger models of the AI industry and the AI race, as well as the slow takeoff environment that AI alignment researchers might be stuck living in.
The AI safety leaders currently see slow takeoff as humans gaining capabilities, and this is true; and also already happening, depending on your definition. But they are missing the mathematically provable fact that information processing capabilities of AI are heavily stacked towards a novel paradigm of powerful psychology research, which by default is dramatically widening the attack surface of the human mind.
Cognitive warfare is not a new X-risk or S-risk. It is a critical factor, we need to understand it to understand factors driving AI geopolitics and AI race dynamics. Cognitive warfare is not a competitor to AI safety, it will not latch on and insert itself, and it must not be allowed to take away attention from AI safety.
With AI and massive amounts of human behavioral data, humans are now gaining profound capabilities to manipulate and steer individuals and systems, and the AI and human behavioral data stockpiles have been accumulating for over 10 years.
Here, I’m making the case that the conditions are already ripe for intensely powerful human thought steering and behavior manipulation technology, and have been for perhaps 10 years or more. Thus, the burden of proof should be on the claim that our minds are safe and that the attack surface is small, not on my claim that our minds are at risk and the attack surface is large. I don’t like to invoke this logic here. This logic should be prioritized for AI alignment, the final engineering problem that this side of the universe hinges on, and an engineering problem that could plausibly be as difficult for humans as rocket science is for chimpanzees. But the logic behind security is still fundamental and I think that I have made the case strongly that the AI safety community requires some threshold of resilience to hacking and that this threshold is probably very far from being met. I also think that most of the required solutions are quick and easy fixes, even if it doesn’t seem that way at first.
The existence of clown attacks is proof that there is at least one powerful cognitive attack, detectable and exploitable by intelligence agencies and large social media companies, which exploits a zero day in the human brain, which also works on AI safety-adjacent people until explicitly discovered and patched.
There are many other ways, especially when you combine human ingenuity with massive amounts of user data and multi-armed bandit algorithms. AI is merely a superior form of multi-armed bandit algorithms, and LLMs are just another increment forward, as they can actually read and understand the content of posts, not just measure changes in behavior from different kinds of people caused by specific combinations of posts.
Social media platforms are overwhelmingly capable of doing this; many people even look to social media as a bellwether to see what’s popular, even though news feed algorithms have massive and precise control over what kinds of people are shown what things, how frequently specific things are shown at all, and which combinations steer people’s thinking and preferences in measurable directions. Social media can even accurately reflect what’s actually popular 98% of the time in order to gain that trust, reserving the remaining 2% to actively determine what becomes popular and what doesn’t. As compute and algorithmic capabilities advance and trust is consolidated, this ratio can be moved closer to 98%.
You can’t just brush off clown attacks because you’ll worry that, if you seriously entertain that line of thought, then other people will assume that you’re on the side of the clowns and you will lose status in their eyes. Sufficiently powerful clown attacks can make this a self-fulfilling prophecy by convincing everyone that a specific line of reasoning is low-status, thus making it low status and creating serious real-life consequences for pursuing a specific line of cognition. The social media news feed or other algorithm-controlled environment (e.g. tiktok, reels) gives the appearance of being a randomly-generated environment, when in reality the platform (and people with backdoor access to the platform’s servers) are highly capable of altering algorithms in order to fabricate an environment making some sentiment appear orders of magnitude more prevalent than it actually is, among a specific demographic of people such as scientists or clowns. Or, even worse, steering people’s thinking in measurable directions by running multi-armed bandit algorithms or gradient descent to find environments or combinations of posts that steer people’s thinking in measurable directions. Clown attacks are merely the most powerful technique that a multi-armed bandit algorithm could arrive at that I’m currently aware of; there are probably plenty of other exploits based off of social status alone, a very serious zero day in the human brain.
This strongly indicates the existence of other zero days and powerful, hard-to-spot exploits, including (but not limited to) exploiting the human instinct to pursue social status, and avoid specific lines of thought based on anticipations of social status gain and loss. These zero days and exploits are either discoverable or already discovered by powerful people who must surreptitiously deploy and refine these exploits at scale, as this is mathematically required in order for them to work at all. This is mathematically provable, covert and large-scale deployments are necessary to get the massive sample sizes of human behavior at the scale necessary to vastly outperform academic psychology, with perhaps 1/10th of the competent workforce or less.
If there was any such thing as a magic trick that could hack the minds of every person in a room at once with none of them noticing, like Eliezer Yudkowsky’s post What is the strangest thing an AI could tell you, it would be to totally deny people the ability to think a specific true thought or approach an obviously valuable line of inquiry, due to intense fear of losing social status if they are associated with that line of inquiry. Social status seems to be something that the human brain evolved to prioritize in the ancestral environment, and this trait alone makes our cognition hackable.
Conspiracy theorists are clowns. The JFK assassinations may have been a critical factor bridging two separate events that were pivotal in US history, the Cuban Missile Crisis (1962) and the Vietnam War (1964-1975). Understanding the Cold War and the US Government’s history is critical for forming accurate models of the US government in its current form (e.g. knowing that the CIA sometimes hijacks entire regimes and orchestrates coups against the ones they can’t), including where AI safety fits in. The same goes for 9/11. And yet, these pivotal points in history and world modeling attract people and epistemics more like those surrounding Elvis’s death, than the Snowden revelations (hopefully, the Snowden revelations don’t end up getting completely sucked into that ugly pit, although there are already plenty of clowns on social media actively trying).
Understanding the current level of risk of cognitive warfare attacks doesn’t just require the security mindset, it requires the security mindset plus an adequate perspective. It requires long list of examples of specific exploits, so that you can get an idea of what else might be out there, what things turned out to be easy to discover with current systems, powered by behavioral data from millions of people, and the continuous access required to perform AI-assisted experimentation and psychological research in real time. I hope that clown attacks were a helpful example towards this end.
Plausible deniability is something you should expect in the 2020s, a world where the lawyers per capita is higher than ever before. Similarly, office politics are also highly prevalent among elites, so the bar is very low for a person to realize that it is a winning strategy to turn people against each other, via starting rumors and scapegoats, then it is for people to know that something came from you. Plausible deniability and false flag attacks did not begin with cyberattacks; they both became prevalent during the 20th century. This is another reason why clown attacks are so powerful; there is an overwhelming prevalence of ambient clowns in contemporary civilization, so it is incredibly difficult to distinguish a clown attack from noise. This plausible deniability further incentivizes clown attacks due to the incredibly low risk of detection; the expected cost of a clown attack basically comes down to server energy costs, since the net expected cost from being discovered is virtually zero.
Analysts were shocked by the swiftness that critical information like lab leak hypothesis and Covid censorship all ended up relegated to the bizarre alternate reality of right-wing clowns, the same universe as pizza slave dungeons and first-trimester abortion being murder, even though the probability of a lab leak and the extent of information tampering on Covid were both obviously critical information for anyone trying to form an accurate world model from 2020-22. That obviousness was simply killed. Clown attacks can do things like that. The human mind hinges enough on social status for things like that to work. There is at least one zero day.
Deciding what people see as low-status or villains vs. high-status or heroes-like-you is generating a very powerful dynamic for shaping what people think, e.g. SBF as an atrocious villain that dominates most people’s understanding of EA and AI safety. Clown attacks are just the most powerful cognitive hack that I’m currently aware of, especially if screen refresh rate manipulation never ends up deployed.
A big element of the modern behavior manipulation paradigm is the ability to just try tons of things and see what works; not just brute forcing variations of known strategies to make them more effective, but to brute force novel manipulation strategies in the first place. This completely circumvents the scarcity and the research flaws that caused the replication crisis which still bottlenecks psychology research today. In fact, original psychological research in our civilization is no longer bottlenecked on the need for smart, insightful people who can do hypothesis generation so that the finite studies you can afford to fund each hopefully find something valuable. With the current social media paradigm alone, you can run studies, combinations of news feed posts for example, until you find something useful. Measurability is critical for this.
By comparing people to other people and predicting traits and future behavior, multi-armed bandit algorithms can predict whether a specific manipulation strategy is worth the risk of undertaking at all in the first place; resulting in a high success rate and a low detection rate (as detection would likely yield a highly measurable response, particularly with substantial sensor exposure such as uncovered webcams, due to comparing people’s microexpressions to cases of failed or exposed manipulation strategies, or working webcam video data into foundation models). When you have sample sizes of billions of hours of human behavior data and sensor data, millisecond differences in reactions from different kinds of people (e.g. facial microexpressions, millisecond differences at scrolling past posts covering different concepts, heart rate changes after covering different concepts, eyetracking differences after eyes passing over specific concepts, touchscreen data, etc) transform from being imperceptible noise to becoming the foundation of webs of correlations. Like it or not, unless you use the arrow keys, the rate at which you scroll past each social media post (either with a touchscreen/pad or a mouse wheel) is a curve; scrolling alone is linear algebra, which fits cleanly into modern AI systems, and trillions of those curves are generated every day.
AI is not even needed to run clown attacks, let alone LLMs. Rather, AI is what’s needed in order to *invent* techniques like clown attacks. Automated information processing is all you need to find manipulation techniques that work on humans. That capability probably came online years ago. And it can’t be done unless you have human behavior data from millions of different people all using the same controlled environment, data that they would not give if they knew the risks.
I can’t know what techniques a multi-armed bandit algorithm will discover without running the algorithm itself; which I can’t do, because that much data is only accessible to the type of people who buy servers by the acre, and even for them, the data is monopolized by the big tech companies (Facebook, Amazon, Microsoft, Apple, and Google) and intelligence agencies large and powerful enough to prevent hackers from stealing and poisoning the data (NSA, etc). I also don’t know what multi-armed bandit algorithms will find when people on the team are competent psychologists, spin doctors, or other PR experts interpreting and labeling the human behavior in the data so that the human behavior can become measurable. Human insight from just a handful of psychological experts can be more than enough to train AI to work autonomously; although continuous input from those experts would be needed and plenty of insights, behaviors, and discoveries would fall through the cracks and take an extra 3 years or something to be discovered and labeled.
AI in global affairs
Clown attacks are not advancing in isolation, they are parallel to a broad acceleration in the understanding and exploitation of the human mind, which itself is a byproduct of accelerating AI capabilities research. For example, we are simultaneously entering a new era where intelligence agencies use AI to make polygraph tests actually work, which would be absolutely transformative for geopolitical affairs which currently revolve around the decision theory paradigm where every single employee is a human who is vastly more capable of generating lies than distinguishing them, and thus cannot be sorted by statements like “I am 100% loyal” or “I know who all the competent and corrupt people on this team are”.
My understanding of the geopolitical significance of influence technologies is that information warfare victories are currently understood as a major win condition for international conflict, similar to conquest by military force; and that this understanding has been prevalent among government and military elites for a long time, starting with the collapse of the Soviet Union and the fall of the Berlin Wall and pulling the rug out from under all of the eastern european communist regimes, possibly starting as early as the Vietnam antiwar movement in the US, but reaching consensus among elites around the 2010s after the backlash to the War on Terror dominated the battlefield itself. Among many other places, this consensus is described in Robert Sutter’s books on US-China relations and Joseph Nye’s book on elite persuasion, Soft Power. Unlike conventional and nuclear wars, information wars can be both fought and won, strike at the human minds that make up the most fundamental building block of government and military institutions, and have a long and rich history of being one of the most important goalposts determining the winners and losers in great power conflicts between the US, Russia/USSR, and China. So we should be considering information warfare to be one of the reasons that governments take AI safety seriously; anticipation of information warfare originating from foreign governments is one of the core features of the contemporary American and Chinese regimes and militaries, and this is widely known among analysts.
Pivoting from social media exposure to 1-1 and group communication still contains substantial risk of psychological hacking, but minimizing the network’s surface area/exposure to social media will still reduce risk substantially and possibly adequately (although access to the technology itself is required in order to verify this adequacy, and the data sets large and secure enough to actually run manipulation research is likely limited to sufficiently large tech companies and intelligence agencies, like Facebook and the NSA, and less accessible to smaller weaker orgs like the Department of Homeland Security or Twitter/X who are vulnerable to hacking and data poisoning by larger orgs, although there are hard-to-verify rumors of sophisticated sensor systems deployed by JP Morgan Chase, and reports that many state-linked Chinese companies and institutions have been experimenting with large sample sizes of deployed electroencephalograms).
The current attack surface for psychological hacks is excessive and extreme in the AI safety community, and even the bare-minimum solutions will receive pushback— phone webcams are difficult to cover up, microphones are difficult to avoid, social media uses gradient descent to find and utilize posts and combinations of posts that cause habit-forming behavior (e.g. optimizing for minimizing quit rates causes the system to find and utilize bizarre combinations of posts that hook people in bizarre ways, such as creating a vague sense that life without social media is an ascetic/monk-like existence when it reality it is default). Quitting major life habits is also difficult by default, so plausible deniability might plausibly be already baked in here, yet again.
I’ve included an optional description by Professor Mark Andrejevic, writing in the Routledge handbook of surveillance, as an intuition flood to help make it easier for more people to understand exactly why the world might literally already revolve around this technology. What’s impressive is that Andrejevic was writing in 2014, and gave no indication of knowing anything about AI; the systems he describes were feasible at the time with just data science and large amounts of user data. AI simply makes these systems even more capable of procuring results.
There is no logical endpoint to the amount of data required by such systems… All information is potentially relevant because it helps reveal patterns and correlations...
At least three strategies peculiar to the forms of population-level monitoring facilitated by ubiquitous surveillance in digital enclosures have become central to emerging forms of data-driven commerce and security: predictive analytics, sentiment analysis, and controlled experimentation. Predictive analytics relies upon mining behavioral patterns, demographic information, and any other relevant or available data in order to predict future actions [by locating and observing similar people or people who share predictive traits]...
Predictive analytics is actuarial in the sense that it does not make definitive claims about the future acts of particular individuals, rather it traffics in probabilities, parsing groups and individuals according to how well they fit patterns that can be predicted with known levels of accuracy. Predictive analytics relies on the collection of as much information as possible, not to drill down into individual activities, but to unearth new patterns, linkages, and correlations...
In combination with predictive analytics, the goal of sentiment analysis is both pre-emptive and productive: to minimize negative sentiment and maximize emotional investment and engagement; not merely to record sentiment as a given but to modulate it as a variable. Modulation means constant adjustment designed to bring the anticipated consequences of a modeled future into the present so as to account for these in ways that might, in turn, reshape that future. If, for example, sentiment analysis reveals resistance to a particular policy or concerns about a brand or issue, the goal is to manage these sentiments in ways that accord with the interests of those who control, capture, and track the data. The goal is not just the monitoring of populations but also their management...
Such forms of management require more than monitoring and recording a population—they also subject it to ongoing experimentation. This is the form that predictive analytics takes in the age of what Ian Ayres (2007) calls super-crunching: not simply the attempt to get at an underlying demographic or emo tional “truth”; not even the ongoing search for useful correlations, but also the ongoing generation of such correlations. In the realm of marketing, for example, data researchers use interactive environments to subject consumers to an ongoing series of randomized, controlled experiments. Thanks to the interactive infrastructure and the forms of ubiquitous surveillance it enables, the activities of daily life can be captured in virtual laboratories in which variables can be systematically adjusted to answer questions devised by the researchers...
The infrastructure for ubiquitous interactivity and thus ubiquitous surveillance transforms the media experiences of daily life into a large laboratory...
Such experiments can take place on an unprecedented scale—in real time. They rely upon the imperative of ubiquity: as much information about as many people as possible...
Taken together, the technology that captures more and more of the details of daily life and the strategies that take advantage of this technology lead to a shift in the way those who use data think about it. Chris Anderson argues that the dramatic increase in the size of databases results in the replacement of data’s descriptive power by its practical efficacy. The notion of data as referential information that describes a world is replaced by a focus on correlation. As Anderson (2008) puts it in “The End of Theory”:
“This is a world where massive amounts of data and applied mathematics replace every other tool that might be brought to bear. Out with every theory of human behavior, from linguistics to sociology … Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves”
This is a data philosophy for an era of information glut: it does not matter why a particular correlation has predictive power; only that it does. It is also a modality of information usage that privileges those with access to and control over large databases, and, consequently, it provides further incentive for more pervasive and comprehensive monitoring.
When the US government, for example, exempts surveillance provisions in the USA Patriot Act from the accountability provided by the Freedom of Information Act, it is sacrificing accountability in the name of security… Threats by countries including India and the United Arab Emirates to ban Blackberry messaging devices unless the company provides the state with the ability to access encrypted information further exemplify state efforts to gain access to data generated in yet another type of digital enclosure...
A loss of control over the fruits of one’s own activity. In the realms of commercial and state surveillance, all of our captured and recorded actions (and the ways in which they are aggregated and sorted) are systematically turned back upon us by those with access to the databases. Every message we write, every video we post, every item we buy or view, our time-space paths and patterns of social interaction all become data points in algorithms for sorting, predicting, and managing our behavior. Some of these data points are spontaneous—the result of the intentional action of consumers; others are induced, the result of ongoing, randomized experiments...
Much will hinge on whether the power to predict can be translated into the ability to manage behavior, but this is the bet that marketers—and those state agencies that adopt their strategies—are making [in 2014].
Like most uses of multi-armed bandit algorithms to make social media steer people’s thinking in measurable directions, clown attacks do not require competent governments or intelligence agencies in order to be successfully deployed against the minds of millions of people. Everything required for this technology is easy to access, except for the massive amounts of human behavioral data. It simply requires sufficient access to social media platforms, and the technological sophistication of one software engineer who understands multi-armed bandit algorithms and one data scientist who can statistically measure human behavior, particularly for people with authority or involvement in extant mass surveillance systems, since enough data means that anyone could eyeball the effects. Measurability is still mandatory for multi-armed bandit algorithms to work, since nobody can see directly into the human mind (although there are many peripheral proxies like fMRI data, heart rate, blood pressure, verbal statements and tone, boy posture, subtle changes in hand and eye movements after reading certain concepts, etc and many of these can be constantly measured by a hacked smartphone).
Lie detectors and clown attacks are the two strongest case studies I’m aware of that would cause AI to dominate global affairs and put AI safety in the crossfire; whether or not this has already happened is largely a question of the math; are AI capabilities to do these things obvious to engineers to tell us that major governments have probably already built the tech? A recent study indicated that a massive proportion of computer vision research is heavily optimized for human behavior research, analysis, and manipulation, and is surreptitiously mislabeled and obfuscated in order to conceal the human research and human use cases that form the core of the computer vision papers, e.g. a consistent norm of referring to the human research subjects as “objects” instead of subjects.
There’s just a large number of human manipulation strategies that are trivial to discover and exploit, even without AI (although the situation is far more severe when you layer AI on top), it’s just that they weren’t accessible at all to 20th century institutions and technology such as academic psychology. If they get enough data on people who share similar traits to a specific human target, then they don’t have to study the target as much to predict the target’s behavior, they can just run multi-armed bandit algorithms on those people to find manipulation strategies that already worked on individuals who share genetic or other traits. Although the average person here is much further out-of-distribution relative to the vast majority of people in the sample data, this becomes a technical problem, as AI capabilities and compute become dedicated to the task of sorting signal from noise and finding webs of correlation with less data. Clown attacks alone have demonstrated that zero days in the brain are fairly consistent among humans, meaning that sample data from millions or billions of people is useable to find a wide variety of zero days in the brains that make up the AI safety community.
First it started working on 60% of people, and I didn’t speak up, because my mind wasn’t as predictable as people in that 60%. Then, it started working on 90% of people, and I didn’t speak up, because my mind wasn’t as predictable as the people in that 90%. Then it started working on me. And by then, it was already too late, because it was already working on me.
Social media’s ability to mix and match (or mismatch) people in specific ways and at great scale, as well as using bot accounts just to upvote and signal boost specific messages so that they look popular, already yielded powerful effects. Adding LLM bots into the equation merely introduces more degrees of freedom.
Among a wide variety of capabilities, platforms like twitter/X were capable of fiddling with their news feed algorithms such that users are incentivised to output as many words as possible in order to gain more likes/points/status, or as high-quality combinations of words as they can muster, rather than whatever maximizes the compulsion to return to the platform the next day (a compulsion that is very, very easy to measure by looking at user retention rates and quit rates, and if it is easy to measure then it is easy to maximize via gradient descent). However, if a social media platform like twitter/X does not optimize for competitiveness against other social media platforms, and the other platforms do, then every subsequent day people will return to other social media platforms more and twitter less. The state of Moloch is similar to the thought experiment where 4 social media platforms run multi-armed bandit algorithms to find ways to increase user engagement by 3 minutes per person per day, and if one platform of the four (let’s imagine that it’s twitter/X) eventually notices that the people themselves are spending 9 hours a day on social media, and twitter/X recoils in horror and decides to cease that policy and instead attempt to increase user engagement by 0 hours per person per day, then the autonomous multi-armed bandit algorithms running on other platforms automatically select strategies that harvest minutes of that time from twitter/X, the lone defector platform, in addition to harvesting minutes from the users’s undefended off-screen IRL time, which is the undefended natural resource that’s easiest for social media systems to steal time from, like an intelligent civilization harvesting an inanimate natural resource like plants or oil. The defector platform then loses its market share and is crowded out of the genepool.
People who can wield gradient descent as a weapon against other people are fearsome indeed (although only the type of person with access to user data and who buys servers by the acre can do this effectively). They not only have the ability to try things that already were demonstrated to work on people similar to you, they also have the ability to select attack avenues in places that you will not look and ways you will not notice, because they have a large enough amount of human behavioral data showing them all the places where people like you did end up looking in the past.
If true, this technology would, by default, become the darkest secret of the big 5 tech companies, and one of the darkest secrets of American and Chinese intelligence agencies. This tech would be the biggest deliberate abuse of psychological research in human history by far, and its effectiveness hinges on the current paradigm where billions of people output massive amounts of sensor data and social media scrolling data (e.g. the detailed pace at which different kinds of people scroll through different kinds of information), although the effectiveness of the clown attack on those not aware of it in particular demonstrates that a general awareness of the risk is not sufficient to protect oneself.
What would the world look like if human thought research and steering technology won out and became hopelessly entrenched 5-10 years ago? Unfortunately, that world would look a lot like this one. There are billions of inward facing webcams pointed at every face looking at every screen. The NSA stockpiles exploits in every operating system and likely chip firmware as well. There are microphones in every room and accelerometers in the palm of every hand (which gives access to heart rate and a wide variety of other peripheral biophysical data that correlates strongly with various cognitive and emotional behavior). The very existence of mass surveillance is known, but that was only due to Snowden, which was one occasion and probably just bad luck; and the main aspect of the Snowden revelations was not that mass surveillance happens at incredible scale, but that the intelligence agencies were wildly successful at concealing and lying about it for years (and subsequently reorganized around the principle of preventing more Snowdens). An epistemic environment further and further in decline as social status, virtue signaling, and vague impressions dominate. And, last of all, international affairs is coming apart at the seams as the old paradigms die and trust vanishes. This is what a world looks like where powerful people have already gained the ability to access the human mind at a far deeper level than any target has access to; unfortunately, it is also what more mundane worlds look like, where human thought and behavior manipulation capabilities remained similar to 20th century levels. However, I’ve made the case very strongly that such technology exists and that due to fundamental mathematical/statistical dynamics and due to human genetic diversity, these systems fundamentally depend on covert large-scale deployment (e.g. social media) in order to get sufficiently large amounts of data to run at all e.g. multi-armed bandit algorithms sufficient to find novel manipulation strategies in real time, and measuring and researching the human thought process sufficiently to use multi-armed bandit algorithms and SGD to steer a target’s thoughts in measurable directions. Therefore, the burden of proof falls even more heavily on the claim that our minds are safe, safe enough for the AI safety community to survive the 2020s at all, not on the claim that our minds are not secure and represent a severe point of failure.
The question of whether the cognitive warfare situation has already become severe is a question that must be approached with sober analysis, not vibes and vague impressions. Vibes and vague impressions are by far the easiest thing to hack, as demonstrated by the clown attack; and in a world where the situation was acute, then keeping people receptive and vulnerable to influence would be one of the most probable attacks for us to expect to be commonplace.
New technology actually does make current civilization out-of-distribution relative to civilization over the last 100 or 1000 years, and thus risks termination of norms, dynamics, and assumptions that have made everything in civilization go fine so far, such as humans being better at lying that detecting lies and thus not capable of organizing themselves based on statements of fact like “I am 100% loyal” or “here is an accurate list of the corrupt and incompetent people on this team”. The specific state of human controllability dominated global affairs e.g. via military recruitment, and when this controllability ratcheted up slightly in the 19th century, it resulted in the total war paradigm of the World War era and the information war paradigm of the Cold War era. Assuming that history will repeat itself, and remain as sensible and intuitive as it always was, is like expecting a psychological study to replicate in an out-of-distribution environment. It certainly might.
With the superior capabilities to research the human mind offered by the combination of social media and AI, governments, tech companies, and intelligence agencies now have the capability to understand aggregate consumer demand better than ever before and manipulate consumer spending and saving in real time, capabilities that were first sought in the 1980s by the Reagan and Thatcher administrations and never fully reached, but that was with 20th century technology; no social media, no mass surveillance, no user data or sensor data, no AI, only psychology (much of which would not replicate due to the study paradigm, which remains inferior to mass surveillance), statistics, focus groups, and polls, each of which were new at the time, and each of which remain available to governments, tech companies, and intelligence agencies today to supplement their new capabilities. It is for this reason that the paradigm of recessions, a paradigm solidified in the 20th century, is a paradigm that we might expect to die, along with many other civilizational paradigms that were only established in the 20th century due to the 20th century’s relative absence of human behavior measurement and thought steering technology.
Taking a step back and looking at a fundamental problem
If there were intelligent aliens, made of bundles of tentacles or crystals or plants that think incredibly slowly, their minds would also have zero days that could be exploited because any mind that evolved naturally would probably be like the human brain, a kludge of spaghetti code that is operating outside of its intended environment, and they would also would not even begin to scratch the surface of finding and labeling those zero days until, like human civilization today, they began surrounding thousands or millions of their kind with sensors that could record behavior several hours a day and find webs of correlations. Of course, if they didn’t have much/any genetic diversity then it would be even easier to find those zero days, and vice versa for intense amounts of diversity. However, the power of the clown attack demonstrates that genetic diversity in humans is not sufficient to prevent zero days from being discovered and exploited; the drive to gain and avoid losing social status is hackable, with current levels of technology (social media algorithms, multi-armed bandit algorithms, and sentiment analysis), indicating that many other exploits are findable as well with current levels of technology.
There isn’t much point in having a utility function in the first place if hackers can change it at any time. There might be parts that are resistant to change, but it’s easy to overestimate yourself on this; for example, if you value the longterm future and think that no false argument can persuade you otherwise, but a social media news feed plants paranoia or distrust of Will Macaskill, then you are one increment closer to not caring about the longterm future; and if that doesn’t work, the multi-armed bandit algorithm will keep trying until it finds something that works. The human brain is a kludge of spaghetti code, so there’s probably something somewhere. The human brain has zero days, and the capability and cost of social media platforms to use massive amounts of human behavior data to find complex social engineering techniques is a profoundly technical matter, you can’t get a handle on this with intuition or pre 2010s historical precedent. Thus, you should assume that your utility function and values are at risk of being hacked at an unknown time, and should therefore be assigned a discount rate to account for the risk over the course of several years. Slow takeoff over the course of the next 10 years alone guarantees that this discount rate is too high in reality for people in the AI safety community to continue to go on believing that it is something like zero. I think that approaching zero is a reasonable target, but not with the current state of affairs where people don’t even bother to cover up their webcams, have important and sensitive conversations about the fate of the earth in rooms with smartphones, and use social media for nearly an hour a day (scrolling past nearly a thousand posts). The discount rate in this environment cannot be considered “reasonably” close to zero if the attack surface is this massive; and the world is changing this quickly. If people have anything they value at all, and the AI safety community probably does have that, then the current AI safety paradigm of zero effort is wildly inappropriate, it is basically total submission to invisible hackers.
The sheer power of psychological influence technologies informs us that we should stop thinking of cybersecurity as a server-only affair. Humans can also make mistakes as extreme as using the word “password” as your password, except it is your mind and your values and your impressions of different lines of reasoning that gets hacked, not your files or your servers or your bank passwords/records. In order to survive slow takeoff and persist for as long as necessary, the AI safety community must acknowledge the risk that the 2020s will be intensely dominated by actors capable of using modern technology to stealthily hack the human mind and eliminate inconvenient people, such as those try to pause AI, even though AI is basically the keys to their kingdom. We must acknowledge that the future of cyberwarfare doesn’t just determine who gets to have their files and verbal conversations be private; in the 2020s, it determines what kinds of thoughts people get to have and what kinds of people do and don’t get to have them at all (e.g. as one single example, clowns do have the targeted thoughts and the rest don’t).
Everything that we’re doing here is predicated on the assumption that powerful forces, like intelligence agencies, will not disrupt the operations of the community e.g. by inflaming factional conflict with false flag attacks attributed to each other due to the use of anonymous proxies.
Most people in AI safety still think of themselves as ordinary members of the population. In reality, this stopped being true a while ago; a bunch of nerds discovered an engineering problem that, as it turns out, the universe actually does revolve around, and a bunch of nerds messing around with technology that is central to geopolitics bears a reasonable chance that geopolitics will bite back, especially in a wild where intelligence agencies have, for decades, messed with and utilized influential NGOs in a wide variety of awful ways, and in a wild where intensely powerful influence technology like clown attacks becomes stronger and stronger determinants of geopolitical winners and losers.
If left to their own devices, people’s decisions technology and social media will be dominated by their self-concept of themself as an average member of the population, who considers things like news feeds and smartphone sensors/uncovered webcams as safe because everyone’s doing it and of course nothing bad would happen to them, when the cybersecurity reality is that the risk of psychological engineering is extreme and in a mathematically provable way, and this nonchalance towards strangers tampering/hacking your cognition and utility function is an unacceptable standard for the group of nerds who actually discovered an engineering problem that this side of the universe revolves around.
The attack surface of the AI safety community is like the surface area of the interior of a cigarette filter; thousands of square kilometers, wrinkled and folded together inside the spongey three-dimensional interior of a 1-inch long cigarette filter. This is not the kind community that survives a transformative world.
AI pause as the turning point
From Shutting Down the Lightcone Offices:
I feel quite worried that the alignment plan of Anthropic currently basically boils down to “we are the good guys, and by doing a lot of capabilities research we will have a seat at the table when AI gets really dangerous, and then we will just be better/more-careful/more-reasonable than the existing people, and that will somehow make the difference between AI going well and going badly”. That plan isn’t inherently doomed, but man does it rely on trusting Anthropic’s leadership, and I genuinely only have marginally better ability to distinguish the moral character of Anthropic’s leadership from the moral character of FTX’s leadership...
In most worlds RLHF, especially if widely distributed and used, seems to make the world a bunch worse from a safety perspective (by making unaligned systems appear aligned at lower capabilities levels, meaning people are less likely to take alignment problems seriously, and by leading to new products that will cause lots of money to go into AI research, as well as giving a strong incentive towards deception at higher capability levels)… The EA and AI Alignment community should probably try to delay AI development somehow, and this will likely include getting into conflict with a bunch of AI capabilities organizations, but it’s worth the cost.
I don’t have much to contribute to the calculations behind this policy (which I’d like to note is just musing by Habryka intended to elicit further discussion, and I might be taking it out of context), other than describing in great detail what a “conflict with a bunch of AI capabilities organizations” would look like, which I’ve been researching for years and it is not pretty; the asymmetry is so serious that even thinking about waging such a conflict could begin closing the window of opportunity for you to make moves, e.g. if some of the people you talk to end up using social media and scrolling past specific bits of information at a pace similar to, say, for example, people who ultimately ended up seeing big tech companies as enemies, but in a cold and calculating and serious way, not in an advocacy way. Sample sizes of millions of people makes that kind of prediction possible; even if there are only a dozen people in the data set of positive cases of cold-bigtech-enmity, there are millions of people who make up the data set for negative cases, allowing analysts to get an extremely good idea of what a potential threat looks like by knowing what a potential threat doesn’t look like. This is only one example, and it is entirely social-media based; it does not use, say, automated analysis of audio data from recorded conversations near hacked smartphones, which is very unambiguously the kind of thing that can be expected to happen to people who would “get into conflict with a bunch of AI capabilities organizations” as those organizations tend to have strong ties, and possibly even substantial revolving door employment, with intelligence agencies; Facebook/Meta is a good example, as they routinely find themselves at the center of public opinion and information warfare conflicts around the world. It’s also unclear to me how sovereign these companies’s security departments are without continued logistical support and staff from American intelligence agencies, as they have to contend with intense interest from a wide variety of foreign intelligence agencies. There is an entire second world here, that is 1) parallel to the parts of the ML community that are visible to us, 2) vastly more powerful, privileged, and dangerous than the ML community, and 3) has a massive vested interest in the goings-on of the ML community, and I’ve encountered dozens and dozens of people in the AI safety community in both SF/berkeley and DC, and if a single person was aware of this second world, they were doing an incredibly good job totally hiding their awareness of it. I think this is a recipe for disaster. I think that the AI safety community is not even thinking about the kinds of manipulation, subterfuge, and sabotage that would take place here just based off of this world’s lawyers-per-capita alone, and the fact that this is a trillion-dollar industry, let alone that this is a trillion-dollar industry due in part to the human influence capabilities I’ve barely begun to describe here, let alone due to interest that these capabilities attracted from all the murkiest people lurking within the US-China conflict.
The attempt to open source Twitter/X’s newsfeed algorithm might have been months ago, but even if it was a step in the right direction, to repeatedly attempt projects like that would cause excessive disruptions and delegitimizations for the industry, particularly Facebook/Meta which will never be able to honestly open-source its systems’s news feed algorithms. Facebook and the other 4 large tech companies (of whom Twitter/X is not yet a member due to vastly weaker data security) might be testing out their own pro-democracy anti-influence technologies and paradigms, akin to Twitter/X’s open-sourcing its algorithm, but behind closed doors due to the harsher infosec requirements that the big 5 tech companies face. Perhaps there are ideological splits among executives e.g. with some executives trying to find a solution to the influence problem because they’re worried about their children and grandchildren ending up as floor rags in a world ruined by mind control technology, and other executives nihilistically marching towards increasingly effective influence technologies so that they and their children personally have better odds of ending up on top instead of someone else. Twitter/X’s measured pace by ope sourcing the algorithm and then halting several months afterwards is therefore potentially a responsible and moderate move in the right direction, especially considering the apparent success of the community notes paradigm at improving epistemics.
The AI safety community is now in a situation where it has to do everything right. The human race must succeed at this task, even if the human brain didn’t evolve to do well at things like having the entire species coordinate to succeed at a single task. Especially if that task, AI alignment, might be absurdly difficult entirely for technical reasons, as difficult for the human mind to solve as expecting chimpanzees to figure out enough rocket science to travel to the moon and back, which would be a big ask regardless of the chimpanzees’s instinctive tendency to form factions and spend 90% of their thinking on clever plots to outmaneuver and betray each other. This means that vulnerability to clown attacks is unacceptable for the AI safety community, and it is also unacceptable to be vulnerable to other widespread social engineering techniques that exploit zero days in the human brain. The degree of vulnerability is highly measurable by attackers, and increasingly so as technology advances, and since it is legible that attackers will be rewarded for exploiting vulnerabilities, it is therefore incentivised for attackers to exploit those vulnerabilities and steer the AI safety community over a cliff.
The AI safety community has long passed a threshold where vulnerability to clown attacks is no longer acceptable; not only does it incentivize more clown attacks, and more ambitious clown attacks, where the attackers have more degrees of freedom, but the AI safety community is in a state where clown attacks can thwart many of the tasks required to do AI safety at all.
Lots of people approach AI social control as though solving AI alignment is priority #1 and the use of AI social control is priority #2. However, this attitude is not coherent. AI alignment is #1, and AI social control is #1a, a subset of #1 with virtually no intrinsic value on its own, only instrumental value to AI alignment, as the use of AI for social control would incentivise accelerating AI and complicate alignment efforts in the meantime, either by direct sabotage by intelligence agencies or AI companies, or by causing totalitarianism or all-penetrating crushing information warfare between the US and China, or some other state of civilization that we might fail to adapt to.
How to protect yourself and others:
Either stop reasoning based on vibes and impressions (hard), or stop spending hours a day inside hyperoptimized vibe/impression hacking environments (social media news feeds). This may sound like a big ask, but it actually isn’t, like cryopreservation; everyone on earth happens to be doing it catastrophically wrong and its actually a super quick fix, less than a few days or even a few hours and your entire existence is substantially safer. But more than 99% of people on earth will look at you as if you were wearing a clown suit, and that intimidates people away from specific lines of thought in a very deep way.
A critical element is for as many people as possible in AI safety to cover up their webcams; facial microexpressions are remarkably revealing, especially to people with access to billions of hours of facial microexpression data of people in flexible hypercontrolled environments like the standard social media news feed. Facial microexpressions are the output of around 60 different facial muscles, and although modern ML probably can’t compress each frame of a video file to a 6 by 10 matrix that represents how contracted each facial muscle is, modern ML bably compress, say, every 5 frames into a 128 by 128 matrix that represents the state of the 20% of the facial muscles that project ~80% of the valuable data/signal outputted by facial muscles. Minimizing exposure to other sensors is ideal as well; it seems pretty likely that a hacked OS can turn earbuds into microphones so definitely switch to used speakers. I use a wireless keyboard so that I don’t have to worry about the accelerometers in my laptop, and a trackball mouse that I use with my off hand. Some potential exploits, like converting RAM chips or accelerometers, mean that it might be difficult to prevent laptops from recording enough audio to acquire changes in your heart rate itself, but there’s a wide variety of other biophysical data that it’s not in your best interest to donate (if you’re going to sell shares of your utility function, at least don’t sell them for zero dollars).
It’s probably a good idea to switch to physical books instead of ebooks. Physical books do not have operating systems or sensors. You can also print out research papers and Lesswrong and EAforum articles that you already know are probably worth reading or skimming; if you drive to the store and actually spend 15 minutes looking, you will probably find an incredibly ink-efficient printer.
I’m not sure whether a text-only inoculation is good enough, or whether there’s no way around the problem other than reducing people’s exposure to sensors and social media. Reading Yudkowsky’s rationality sequences will definitely make one’s thoughts and behaviors harder to predict, but it won’t patch or rearrange the zero days in the human brain. Even Tuning your Cognitive Strategies and Raemon’s Feedbackloop First Rationality won’t do that, although it might go a long way towards making it much harder to gather data on the details of those zero days by comparing your mind and behavior to the minds and behavior of millions of people that make up the data, because your mind will be just be more different from those people than before, and you will be different from them in very different ways from how they are different from each other. You will be out-of-distribution, and the distribution is a noose that is slowly tightening around the mind of most people on earth. If reading the sequences seem daunting, I recommend starting with the highlights or randomly selecting posts from them out-of-order which is a fantastic way to start the day. HPMOR is also an incredibly fun and enjoyable alternative to read in your down time if you’re busy (and possibly one of the most rereadable works of fiction ever written), and Planecrash is a similarly long and ambitious work by Yudkowsky that is darker and less well-known, even though the self-improvement aspect seems more refined and effective. Other safe predictability reducers include Scott Alexander’s codex for a more world-modeling focus, and the CFAR handbook for more practical self-enhancement work. These are all highly recommended for self-improvement.
It’s probably best to avoid sleeping in the same room as a smart device, or anything with sensors, an operating system, and also a speaker. The attack surface seems large, if the device can tell when people’s heart rate is near or under 50 bpm, then it can test all sorts of things e.g. prompting sleep talk on specific topics. The attack surface is large because there’s effectively zero chance of getting caught, so if people were to experiment with that, it wouldn’t matter how low the probability of success is for any idea to try or whether they were just engaging in petty thuggery. Just drive to the store and buy a clock, it will be like $30 at most and that’s it.
Of course, you should be mindful of what you know and how you know it, but in the context of modern technologies, it’s more important to pay attention to your feelings, vibe, and impression of specific concepts, and you successfully search your mind and trace the origins of those feelings, you might notice the shadows cast by the clowns that have fouled up targeted concepts (you probably don’t remember the clowns themselves, as they are nasty internet losers). But that’s just one zero day, and the problem is bigger and more fundamental than that. Patching the clown attack is valuable, but it’s still a band-aid solution that will barely address the root of the issue.
I am available via LW message to answer any questions you might have.
- 30 Apr 2024 21:51 UTC; 65 points) 's comment on levin’s Quick takes by (EA Forum;
- why did OpenAI employees sign by 27 Nov 2023 5:21 UTC; 49 points) (
- Upgrading the AI Safety Community by 16 Dec 2023 15:34 UTC; 41 points) (
- 5 Reasons Why Governments/Militaries Already Want AI for Information Warfare by 30 Oct 2023 16:30 UTC; 32 points) (
- 20 Nov 2023 18:10 UTC; 17 points) 's comment on OpenAI Staff (including Sutskever) Threaten to Quit Unless Board Resigns by (
- 11 Jan 2024 1:15 UTC; 12 points) 's comment on “Dark Constitution” for constraining some superintelligences by (
- 20 Jan 2024 0:47 UTC; 7 points) 's comment on On “Geeks, MOPs, and Sociopaths” by (
- 5 Reasons Why Governments/Militaries Already Want AI for Information Warfare by 12 Nov 2023 18:24 UTC; 5 points) (EA Forum;
- 20 Nov 2023 19:32 UTC; 5 points) 's comment on OpenAI Staff (including Sutskever) Threaten to Quit Unless Board Resigns by (
- 28 Oct 2023 17:25 UTC; 4 points) 's comment on Linkpost: Rishi Sunak’s Speech on AI (26th October) by (
- 13 Nov 2023 23:57 UTC; 4 points) 's comment on It’s OK to eat shrimp: EAs Make Invalid Inferences About Fish Qualia and Moral Patienthood by (
- 29 Nov 2023 19:19 UTC; 3 points) 's comment on Ben_West’s Quick takes by (EA Forum;
- 28 Oct 2023 17:05 UTC; 2 points) 's comment on Managing AI Risks in an Era of Rapid Progress by (
- 1 Nov 2023 21:41 UTC; 2 points) 's comment on Thoughts on the AI Safety Summit company policy requests and responses by (
- 24 Oct 2023 20:35 UTC; 2 points) 's comment on Lying is Cowardice, not Strategy by (
- 17 Jan 2024 21:10 UTC; -8 points) 's comment on Learning By Writing by (
- 13 Jan 2024 20:25 UTC; -17 points) 's comment on AGI Ruin: A List of Lethalities by (
Would you mind if I rewrote this in a less “manic” tenor, keeping the content and mood largely the same, and reposted? I like this essay and think the core of what you’re suggesting is reasonable, for reasons both stated and unstated, but I would like to try to say it differently in a way I think it will be taken better.
I’ve been hoping for years that someone else could do this instead of me; I did this research to donate it, and if I’m the wrong person to communicate it (e.g. I myself am noise/ambient-clowning the domain) then that’s on me and I’d be grateful for that to be fixed.
Would be up for this project. As is, I downvoted Trevor’s post for how rambly and repetitive it is. There’s a nugget of idea, that AI can be used for psychological/information warfare that I was interested in learning about, but the post doesn’t seem to have much substantive argument to it, so I’d be interested in someone both doing an incredibly shorter version which argued for its case with some sources.
My thinking about this is that it is a neglected research area with a ton of potential, and also a very bad idea for only ~1 person to be the only one doing it, so more people working on it would always be appreciated, and it would also be in their best interest too because it is a gold mine of EV for humanity and also deserved reputation/credit. So absolutely take it on.
Also, I think this post does have substantive argument and also sources. The argument I’m trying to make is that we’re entering a new age of human manipulation research/capabilities by combining AI with large sample sizes of human behavior, and that the emergence of that kind of sheer power would shift a lot of goalposts. Finding evidence of a specific manipulation technique (clown attacks) was hard, but it was comparatively much easier to research the meta-process that generates techniques like that, and that geopolitical affairs would pivot if mind control became feasible.
To be frank trevor, you don’t seem to have referenced or cited any of the extensive 20th century and prior literature on memetics, social theory, sociology, mass movements, human psychology in large groups, etc…
Which is likely what the parent was referring to.
Although I have read nowhere close to all of it, I’ve read enough to not see any novel substantive arguments or semi-plausible proofs.
Most LW readers don’t expect anything at the level of a formal mathematical or logical proof, but sketching out a defensible semi-plausible path to one would help a lot. Especially for a post of this length.
It also doesn’t help that your taking for granted many things which are far from decided. For example, the claim:
Sounds very exaggerated because cryopreservation itself does not have that solid of a foundation as your implying here.
Since no one has yet offered a physically plausible solution to restoring a cryopreserved human, even with unlimited amounts of computation and energy, with their neural structure and so on intact. (That fits within known thermodynamic boundaries.)
It’s more of a ‘there is a 1 in a billion chance some folks in the future will stumble on a miracle and will choose to work on me’ or some variation. And people doing it anyways since even a tiny tiny chance is better than nothing in their books.
That was my mistake, when I said “mathematically provable”, I meant provable with math, not referring to a formal mathematical proof or logical proof. I used that term pretty frequently so it was a pretty big mistake.
The dynamic is pretty fundamental though. I refer to it in the “taking a step back” section:
This comment needs to be frontpaged, stickied, added as a feature for people with >N karma (like “request feedback”), engraved on Mount Rushmore, tattooed on my forehead, and scrawled in a mysterious red substance in my bathroom.
What do you mean? Surely they aren’t offering this for anyone who writes anything manicly. It would be nice if someone volunteered for doing that service more often though.
My comment is partly a manic way of saying “yes this is a good service which more people should both provide and ask for”. Not sure how practical it would be to add as a feedback-like formal feature. And of course I don’t think lc should personally be the one to always do this.
LessWrong already has a formal feature to ask for feedback for anyone with >100 karma.
Yes. I’m aware of that. I noted it in the “(like “request feedback”)” part of my comment.
This post is fun but I think it’s worth pointing out that basically nothing in it is true.
-”Clown attacks” are not a common or particularly effective form of persuasion
-They are certainly not a zero day exploit; having a low status person say X because you don’t want people to believe X has been available to humans for our entire evolutionary history
-Zero day exploits in general are not a thing you have to worry about; it isn’t an analogy that applies to humans because we’re far more robust than software. A zero day exploit on an operating system can give you total control of it; a ‘zero day exploit’ like junk food can make you consume 5% more calories per day than you otherwise would.
-AI companies have not devoted significant effort to human thought steering, unless you mean “try to drive engagement on a social media website”; they are too busy working on AI.
-AI companies are not going to try to weaponize “human thought steering” against AI safety
-Reading the sequences wouldn’t protect you from mind control if it did exist
-Attempts at manipulation certainly do exist but it will mostly be mass manipulation aimed at driving engagement and selling you things based off of your browser history, rather than a nefarious actor targeting AI safety in particular
The “just five percent more calories” example reveals nicely how meaningless this heuristic is. The vast majority of people alive today are the effective mental subjects of some religion, political party, national identity, or combination of the three, no magical backdoor access necessary; the confirmed tools and techniques are sufficient to ruin lives or convince people to do things completely counter to their own interests. And there are intermediate stages of effectiveness that political lobbying can ratchet up along, between the ones they’re at now and total control.
The premise of the above post is not that AI companies are going to try to weaponize “human thought steering” against AI safety. The premise of the above post is that AI companies are going to develop technology that can be used to manipulate people’s affinities and politics, Intel agencies will pilfer it or ask for it, and then it’s going to be weaponized, to a degree of much greater effectiveness than they have been able to afford historically. I’m ambivalent about the included story in particular being carried out, but if you care about anything (such as AI safety), it’s probably necessary that you keep your utilityfunction intact.
Yes they are, clown attacks are an incredibly powerful and flexible form of Overton window manipulation. They can even become a self-fulfilling prophecy by selectively sorting domains of thought among winners and losers in real life, e.g. only losers think about lab leak hypothesis.
It’s a zero-day exploit because it’s a flaw in the human brain that modern systems are extremely capable of utilizing to steer people’s thinking without their knowledge (in this case, denial of certain lines of cognition). You’re right that it’s not new enough to count days, like a zero day in computers, but it’s still less than a decade old that it’s been exploited this powerfully (orders of magnitude more effective than ever before).
Like LLMs, the human mind is sloppy and slimy; clown attacks are an example of something that multi-armed bandit algorithms can repeatedly try until something works (the results always have to be measure able though).
I’m thinking the big 5 tech companies, Facebook Amazon Apple Google Microsoft, and intelligence agencies like the NSA and Chinese agencies. I am NOT thinking about e.g. OpenAI here.
I made the case that these agencies have historically unprecedented amounts of power, and since AI is the keys to their kingdom, trying to establish an AI pause does indeed come with that risk.
I might be wrong about The Sequences hardening people, but I think these systems are strongly based on human behavior data, and if most of the people in the data haven’t read The Sequences, then people who read The Sequences are further OOD than they would have been and therefore less predictable.
I agree that profit-driven manipulation might still be primary and was probably how human manipulation capabilities first emerged and were originally fine-tuned, probably in the early-mid 2010s. But since these are historically unprecedented degrees of power over humans, and due to international information warfare e.g. between the US and China (which is my area of expertise), I doubt that manipulation capabilities remained exclusively profit-driven. I think that it’s possible that >90% of people at each of the tech companies haven’t worked on these systems, and of the 10% who have, it’s very possible that 95% of those people only work on profit-based systems. But I also think that there are some people who work on geopolitics-prioritizing manipulation too, e.g. revolving door employment with intelligence agencies.
“Clown attack” is a phenomenal term, for a probably real and serious thing. You should be very proud of it.
I think that the people at Facebook/Meta and the NSA probably already coined a term for it, likely an even better one as they have access to the actual data required to run these attacks. But we’ll never know what their word was anyway, or even if they have one.
On clown attacks, it’s notable that activist egregores conduct them autonomically, by simply dunking on whoever they find easiest to dunk on, the dveshi hoists the worst representatives of ideas ahead of the good ones.
I absolutely agree! Part of what makes clown attacks so powerful is the plausible deniability; most clowns are not attacks. As a result, attackers have plenty of degrees of freedom to try things until something works, so much so that they can even automate that process with multi-armed bandit algorithms, because there’s basically no risk of getting caught.
Scrolling down this almost stream of consciousness post against my better judgement, unable to look away perfectly mimicked scrolling social media. I am sure you did not intend it but I really liked that aspect.
Loads of good ideas in here, generally I think modelling the alphabet agencies is much more important than implied by discussion on LW. Clown attack is a great term, although I’m not entirely sure how much personal prevention layer of things really helps the AI safety community, because the nature of clown attacks seems like a blunt tool you can apply to the public at large to discredit groups. So, primarily the vulnerability of the public to these clown attacks is what matters and is much harder to change.
Yes, the whole issue is that you need to see the full picture in order to understand the seriousness of the situation. For example, screen refresh rate manipulation is useless without eyetracking, and eyetracking is useless without refresh rate manipulation, but when combined together, they can become incredibly powerful targeted cognition inhibitors (e.g. giving people barely-noticeable eyestrain every time that a targeted concept is on their screen). I encountered lots of people who were aware of the power of A/B testing, but what makes A/B testing truly formidable is that AI can be used to combine it with other things, because AI’s information processing capabilities can automate many tasks that previously bottlenecked human psychological research.
Two decades ago the ChaosComputerClub declared “We lost the War” and articulated the intellectual framework based on which Wikileaks was built.
Today, many more people come to the yearly Chaos Computer Congress but on the other hand, a lot less political action is emerging from the club. While it’s not clear to what extent the CIA is responsible for that, it would make sense from their perspective to act here.
While their campaign was about Julian Assange being seen as a clown, they didn’t do the same thing to the Chaos Computer Club.
A few other people were targeted as well, but generally people who were seen as having done something explicitely blameworthy. For anyone who’s interested in how this works in practice, Andy Müller-Maghun’s talk about what the CIA did against him is very worth watching.
There are lots of ways to simply distract someone from engaging in a powerful way with politics that are not about overt destruction.
If people associate having to think about AI safety with needing to be very paranoid that might in itself discourage people from thinking.
Agreed.
That’s pretty disturbing to hear, that AI safety could end up neutered, like what potentially happened there. There’s probably plenty of FBI informants in the Bay Area, who change identities and gain status in communities like we change clothes. But it’s possible that an AI pause actually is the minimum necessary ask for humanity to make it through (from AGI alone, of course, taking on the NSA is basically asking to end up like the CCC).
I agree, but it’s not just about distracting, being able to steer people’s thinking has all kinds of creative uses, especially in a civilization where the lawyers-per-capita is this high. For example, I wrote:
I agree that clown attacks seem to be possible. I accept a reasonably high probability (c70%) that someone has already done this deliberately—the wilful denigration of the Covid lab leak seems like a good candidate, as you describe. But I don’t see evidence is that deliberate clown attacks are widespread. And specifically, I don’t see evidence that these are being used by AI companies. (I suspect that most current uses are by governments.)
I think it’s fair to warn against the risk that clown attacks might be used against the AI-not-kill-everyone community, and that this might have already happened, but you need a lot more evidence before asserting that it has already happened. If anything, the opposite has occurred, as the CEOs of all major AI companies signed onto the declaration stating that AGI is a potential existential risk. I don’t have quantitative proof, but from reading a wide range of media across the last couple of years, I get the impression that the media and general public are increasingly persuaded that AGI is a real risk, and are mostly no longer deriding the AGI-concerned as being low-status crazy sci-fi people.
I agree with some of this. I admit that I’ve been surprised several times by leading AI safety community orgs outperforming my expectations, from Openphil to MIRI to OpenAI. However, considering the rate that the world has been changing, I thing that the distance between 2023 and 2033 is more like the distance between 2023 and 2003, and the whole point of this post is taking a step back and looking at the situation which is actually pretty bad.
I think that between the US/China AI competition, and the AI companies also competing with each other under the US umbrella, as well as against dark AI companies like Facebook and companies that might be started indigenously by Microsoft or Apple or Amazon under their full control, and the possibility of the US government taking a treacherous turn and becoming less democratic more broadly (e.g. due to human behavior manipulation technology), I’m still pessimistic that the 2020s have more than a 50% chance of going well for AI safety. For example, the AI safety community might theoretically be forced to choose between rallying behind a pause vs. leaving humanity to die, and if they were to choose the pause in that hypothetical, then it’s reasonable to anticipate a 40% chance of conflict.
Possibly relevant: Weak Men are Superweapons.
This is probably one of the most important articles in the modern era. Unbelievable how little engagement it’s gotten.
Even now, I regret botching this post by writing and posting it as fast as possible, and making it tl;dr.
I wrote a much shorter and better version, which also takes into account Zack M Davis’s Optimized Propaganda with Bayesian Networks which focuses on the ability to run internal interpretability on the process of human opinion formation by using large sample sizes of human data to track causality networks between different beliefs (in his example, it was just large sample sizes of human survey data, rather than the combination of large amounts of human biodata and news feed scrolling data).
What probability do you put on AI safety being attacked or destroyed by 2033?
Considering the rate that the world has been changing, I’d say that the distance between 2023 and 2033 is more like the distance between 2023 and 2003, and the whole point of this post is taking a step back and looking at the situation which is actually pretty bad, so I’d say ~30% because a lot of things will happen just in general, and under 20% would be naive.
Under 20% would require less than a 2% chance per year and FTX alone blows that out of the water, let alone OpenAI. I think I made a very solid case that there are some really nasty people out there who already have the entire AI safety community by the balls, and if an AI pause is the minimum ask for humanity to survive, then you have to start conflict over it.
I don’t get if that’s your estimate for AI safety being attacked or for AI [safety] being destroyed by 2033. If that’s the former, what would count as an attack? If that’s the latter, what would you count as strong evidence that your estimate was wrong? (assuming you agree that AI safety still existing in 2033 counts as weak evidence given your ~30% prior).
You’re right, I had kind of muddled thinking on that particular point. My thinking was that they would try to destroy or damage AI safety and the usual tactics would not work because AI safety is too weird, motivated, and rational (although they probably would not have a hard time measuring motivation sufficient to detect that it is much higher than normal interest groups). I tend to think of MIRI as an org that they can’t pull out the rug under from because it’s hardened e.g. it will survive in and function in some form even if everyone else in the AI safety community is manipulated by gradient descent into hating MIRI, but realistically Openphil is probably much more hardened. It’s also hard to resolve because this tech is ubiquitous so maybe millions of people get messed with somehow (e.g. deliberately hooked on social media 3 hours per day). What AI safety looks like after being decimated would probably be very hard to picture; for steel manning purposes I will say that the 30% would apply to being heavily and repeatedly attacked and significantly damaged well beyond the FTX crisis and the EAforum shitposts.
Frankly, I think I should lose a lot of Bayes points if this technology is still 10 years away. I know what I said in the epistemic status section, but I actually do think that the invention and deployment of this tech is heavily weighted towards the late 2010s and during COVID. If this tech didn’t become a feature of slow takeoff then I would lose even more Bayes points.
That was not my thought (I consider interactive clarifications as one of our most powerful tool, then pressure to produce perfect texts as counterproductive), but..
…I appreciate the concrete predictions very much, thanks!
That sounds close to when MIRI decided to keep everything secrete. Maybe that was after a few clowns were advocating for transparency. 🤔
Not a direct response (and I wouldn’t want it to be read that way), but there is some old Gwern research on this topic: https://www.lesswrong.com/posts/TiG8cLkBRW4QgsfrR/notes-on-brainwashing-and-cults
My impression is that what trevor refers to as “brainwashing” and “mind control” is not actually “brainwashing as popularly understood”, i. e. a precision-targeted influence that quickly and unrecognizably warps the mind of an individual. Rather, what they have in mind is a more diffuse/incremental effect, primarily noticeable at population-wide scales and with the individual effects being lesser and spread across longer time periods — but those effects nevertheless being pivotal, when it comes to the fate of movements. And this is in fact a thing that we more or less know exists, inasmuch as propaganda and optimizing for engagement are real things.
Then there’s a separate claim building up on that, a speculation that AI and the Big Data may allow to supercharge these effects into something that may start to look like brainwashing-as-popularly-understood. But I think the context provided by the first claim makes this more sensible.
It’s the general tendency I’ve somewhat noticed with trevor’s posts — they seem to have good content, but the framing/language employed has a hint of “mad conspiracy-theory rambling” that puts people off. @trevor, maybe watch out for that? E. g., I’d dial down on terms like “mind control”, replace them with more custom/respectable-looking ones. (Though I get that you may be deliberately using extreme terms to signal the perceived direness of the issue, and I can’t really say they’re inaccurate. But maybe look for a way to have your cake and eat it too?)
Yeah, I spent several years staying quiet about this because I assumed that bad things happened to people who didn’t. When I realized that that was a vague reason, and that everyone I talked to also seemed to have vague reasons for not thinking about this, I panicked and wrote a post as fast as possible by typing up handwritten notes and stitching the paragraphs together. That was a pretty terrible mistake. By the time I realized that it was longer than EY’s List of Lethalities, it was already too late, and I figured that everyone would ignore it if I didn’t focus really hard on the hook.
I absolutely agree that this is a good way to look at things. For example, the 3 minutes per person per day moloch I referenced was a hypothetical bad future that a lot of people worried about, but as it turned out, the capabilities to use gradient descent to steer human behavior in measurable directions may have resulted in a good outcome, where the superior precision allows them to reduce quit rates, but balancing that optimization with optimizing for preventing overuse. This featured heavily in the Facebook files; whenever Facebook encountered some awful problem, the proposed solution was allegedly “we need better AI so we can optimize for things like that not happening”.
I don’t want to dismiss the potentially-high probability that things will just go fine; in fact, I actually covered that somewhat:
I’m just advocating for being prepared both for the good outcome and the bad one. I think that the 2020s will be a flashpoint for this, especially if it’s determined that an AI pause really is the minimum ask for humanity to survive (which is a reasonable proposition).
this very badly mismatches my firsthand experiences and observations of social movements in a way that makes me suspect a mismatch between what we even mean by the word “brainwashing” or something. Perhaps it is because I did not need the update this intends to convey, which is that brainwashing does not allow forcibly replacing your opinions, but rather works by creating social pressure to agree habitually with a group?
I think that Habryka was referring to the tendency for people to worry about mind control and manipulation, not comparing human manipulation via gradient descent to human manipulation via brainwashing.
Personally, I think that worrying about advances in human manipulation is always something to be concerned about, since the human brain is a kludge of spaghetti code, so surely someone would find something eventually (I argued here that social media dramatically facilitated the process of people finding things), and it naturally follows that big discoveries there could be transformative, even if there were false alarms in the past (with 20th century technology). But the fact that the false alarms happened at all is also worth consideration.
A challenge for folks interested: spend 2 weeks without media based entertainment.
I’d love it if people could try the basic precautions and see how harmless they are! Especially because they might be the minimum ask in order to avoid getting your brain and motivation/values hacked.
I guess there would be bonus points for avoiding watching videos that millions of other people have watched.
When I say media, I mean social media, movies, videos, books etc- any type of recording or something that you believe you’re using as entertainment.
I’m trying this myself. Done singular days before, sometimes 2 or 3 days, but failed to keep it consistent. I did find that when I did it, my work output was far higher and greater quality, I had a much better sleeping schedule and was generally in a much more enjoyable mood.
I also ended up spending more time with friends and family, meeting new people, trying interesting things, spending time outdoors, etc.
This time I’m building up to it- starting with 1 media free hour a day, then 2 hours, then 3, etc.
I think building up to it will let me build new habits which will stick more.
I predict (losing Bayes points if I’m wrong) that most people will have a similar experience, but I also predict that the best strategy is to quit cold turkey; nicotine does not run SGD to notice that user retention is at risk and autonomously take actions that were successful at mitigating risk in the past.
It would be hard for them to make their systems not optimize in weird ways due to goodhart’s law; furthermore, anyone running a successful social media platform would need to give the algorithms a wide leeway to experiment with user retention, since competitor platforms might be running systems that also autonomously form novel strategies.
Most of my knowledge on dependencies and addictions comes from a brief study I did on neurotransmitter’s roles in alcohol dependence/abuse while in school, for an EPQ, so I’m really not sure how much of this applies- also, a lot of my study was finding that my assumptions were in the wrong direction(I didn’t know about endorphins)- but I think a lot of the stuff on neurotransmitters and receptors holds across different areas- take it with some salt though.
Quitting cold turkey rarely ever works for addictions/dependencies. The vast majority of time the person has a big resurgence in the addiction.
The balance of dopamine/sensitivity of the dopamine receptors often takes time to shift back.
Tapering, I think for this reason, has been one of the most reliable ways of recovering from an addiction/dependence. I believe it’s been shown to have a 70% success rate.
Interestingly, the first study I found on tapering, which is testing tapering strips in assistance of discontinuing antidepressant use, also says 70% https://journals.sagepub.com/doi/full/10.1177/20451253211039327
Ever site I read on reducing alcohol dependency with tapering said something similar, back in the day.
Wouldn’t this be useful only if one knows for certain their ‘brain and motivation/values’ are not already ‘hacked’ beforehand?
Otherwise it would just strengthen the pre-existing ‘hacks’.
The exploit I’m aware of that could make someone chain themselves into remaining vulnerable or compromised are exploits that drive people to continue exposure to high-risk environments that facilitate more exploits. For example, continuing to use social media or leave webcameras uncovered.
This is why the EV of ceasing social media use and covering up webcams is so high; they facilitate further manipulation to keep you and your friends vulnerable.
EDIT: I also think it’s worthwhile to think of things, e.g. planting ideas in people’s heads or setting them up to react a certain way if specific conditions are met. For example, persuading them that caring exclusively about their friends and family, instead of the future, is part of the maturation process, or insinuating that Yudkowsky is evil, preventing them from reading the Sequences or contributing to AI safety.
I tend to think that it will primarily chain them back to social media, because that’s where the magic happens (especially because humans on the smarter end will inevitably become OOD over time, and hopefully become truer to themselves).
I assume you do not have a mathematical proof of that, or you’d have mentioned it. What makes you think it is mathematically provable?
I would be very interested in reading more about the avenues of research dedicated to showing how AI can be used for psychological attacks from the perspective of AIS (I’d expect such research to be private by default due to infohazards).
Yes, I thought for years that the research should be private but as it turns out, most people in policy are pretty robustly not-interested in anything that sounds like “mind control” and the math is hard to explain, so if this stuff ends up causing a public scandal that damages the US’s position in international affairs then it probably won’t originate from here (e.g. it would get popular elsewhere like the AI surveillance pipeline) so AI safety might as well be the people that profit off of it by open-sourcing it early.
It’s actually a statistical induction. When you have enough human behavioral data in one place, you can use gradient descent to steer people in measurable directions if the people remain in the controlled interactive environment that the data came from (and social media news feeds are surprisingly optimized to be that perfect controlled environment). More psychologists mean better quality data-labeling, which means people can be steered more precisely.
(I think this’d have benefited from explaining what a clown attack is much earlier in the post)
Fixed, thanks.
I was still confused when I opened the post. My presumption was that “clown attack” referred to a literal attack involving literal clowns. If you google “clown attack,” the results are about actual clowns. I wasn’t sure if this post was some kind of joke, to be honest.
I strongly disbelieve that discussion of Snowden’s findings has been limited to mostly fringe right-wing platforms. He has widely watched interviews with John Oliver, MSNBC, and Vice, and has been covered favorably by news outlets such as the Guardian, the New York Times, and NPR.
Yes, in that case, I was thinking about that as a worrying trend and an example that would make it easier to see what I’m talking about (i.e. intuition flooding); sorry if I implied that western media was currently doing a bad job covering Snowden. I had a pretty good experience using the large western media outlets to read about the Snowden revelations.
You sometimes hear an argument like this in conspiracy theory groups. It gos something like this:
“My own pet conspiracy theory is sensible! But all the other conspiracy theories on here, they’re completely stupid! Nobody could possibly believe that! In fact, I think they’re all undercover agents sent by the government to make conspiracy theorists look stupid. Oh, wait, that’s also a conspiracy theory, isn’t it? Yes, I believe that one.”
South Park used that one: in one episode, it turns out that the 9/11 conspiracy theorists were actually government agents trying to make the government seem competent.
This post was difficult to take seriously when I read it but the “clown attack” idea very much stuck with me.
I think most “clown attacks” are performed by genuine clowns, not by competent intelligence agencies.
Does this make them better? Not really.
It’s also an attack that’s hard to pull off, especially against a plausible sounding idea that has been endorsed by someone high status.
Did we see an attempt at a clown attack against the lab leak hypothesis. Probably. Not a very successful one, but one that kind of worked for a few months. Because intelligence agencies aren’t that competent.
Yes, plausible deniability and the very high ratio of ambient/noise clowns is probably one of the main things that makes clown attacks powerful, and it resonates well with user-data based targeted influence systems (because attackers can automate the process of trying various things until they find manipulation strategies that work well on different kinds of people who are ordinarily difficult to persuade).
I’d argue that plausible deniability makes clown attacks easy to pull off, and that if a clown attacks was used to deny people cognition about the lab leak hypothesis, then it was wildly successful and still is; lab leak probably won’t be one of the main issues in the 2024 election even though it would naturally be more relevant than all the other issues combined. That’s the kind of thing that becomes possible with modern AI-powered psychological research systems, although the vastly weaker 20th century psychological research paradigm might have been sufficient there too.
lc and I have both written high-level posts about evaluating intelligence agency competence; it remains an open question since you would expect large numbers of case studies of incompetence regardless of the competence of the major players at the top 5-20%.
In a world in which the replication attempts went the other direction and social priming turned out to be legit, I would probably agree with you. But even in controlled laboratory settings, human behavior can’t be reliably “nudged” with subliminal cues. The human brain isn’t a predictable computer program for which a hacker can discover “zero days.” It’s a noisy physical organ that’s subject to chaotic dynamics and frequently does things that would be impossible to predict even with an extremely extensive set of behavioral data.
Consider targeted advertising. Despite the amount of data social media companies collect on their users, ad targeting still sucks. Even in the area of attempted behavior manipulation that’s subject to more optimization pressure than any other, companies still can’t predict, let alone control, their users’ purchasing decisions with anything close to consistency. Their data simply isn’t sufficient.
What would it take to make nudges actually work? Even if you covered the entire surface of someone’s living area with sensors, I doubt you’d succeed. That would just give you one of the controlled laboratory environments in which social priming still failed to materialize. As mentioned above, the brain is a chaotic system. This makes me think that reliably superhuman persuasion at scale would be impractical even for a superintelligence, aside from with brain-computer interfaces.
Scott Alexander wrote a really good article on this; lots of people out there right now are falsely concluding that nudges don’t exist. Furthermore, the replication crisis is an issue from academic psychology, which I argued in this post is obsolete and should stop anchoring our understanding of how effective human manipulation can become in the 2020s. I cover some of this in the “alien” paragraph.
I agree that the vast majority of people attempting to do targeting advertising do not have sufficient data. But that doesn’t tell us much about whether the big 5 tech companies, or intelligence agencies, have sufficient data to do that, and aren’t being really loud about it. My argument is that, due to the prevalence of data theft and data poisoning, it’s entirely plausible that sufficient data is being monopolized by extremely powerful people, as the profit motive alone is enough for that, let alone a new era of controlling people. I’ve met several people who tried to do predictive analytics for campaign staff, and each one of them had grossly insufficient data compared to the big 5 tech companies, and yet they were confident that the big 5 tech companies couldn’t do it either. It really shouldn’t be surprising that this mentality would be prevalent in both scenarios where the human steering tech was and wasn’t acute.
I highly doubt that the signal to noise ratio of the human brain makes predictive analytics impossible, let alone with sample sizes in the millions or even billions. One of my main arguments is that there are in fact millions or billions of sensors pointed at millions of people (although of course telecom security might limit the ability for any actor anywhere to acquire enough unpoisoned data). Clown attacks is an example; the human though process is indeed sloppy, but it is sloppy in consistent, ways and these consistent ways can easily be discovered and exploited when you have such large amounts of interactive behavioral data. Gradient descent + social media algorithms alone can steer people’s thinking in measurable directions, insofar as those directions are measurable.
I think it is theoretically possible that humans are extremely resistant to manipulation of all kinds, but I’d argue that bar for proving or indicating that is much higher than you seem to think here, especially in the current experiment-based paradigm where only powerful people get access to enough useful data and the data centers required to process it (though it’s possible that this level of research is less centralized and more accessible than I thought, JP Morgan’s experiments on their employees surprised me, but I also wouldn’t be surprised to find that the NSA poisoned JP Morgan’s data sets because they took issue with major banks joining in on gaining these capabilities).
If any of the big tech companies had the capability for actually-good targeted advertising, they’d use it. The profit motive would be very strong. The fact that targeted ads still “miss” so frequently is strong evidence that nobody has the highly advanced, scalable, personalized manipulation capabilities you describe.
Social media recommendation algorithms aren’t very powerful either. For instance, when I visit YouTube, it’s not unusual for it to completely fail to recommend anything I’m interested in watching. The algorithm doesn’t even seem to have figured out that I’ve never played Skyrim or that I’m not Christian. In the scenario in which social media companies have powerful manipulation capabilities that they hide from the public, the gap between the companies’ public-facing and hidden recommendation systems would be implausibly large.
As for chaotic dynamics, there’s strong experimental evidence that they occur in the brain, and even if they didn’t, they would still occur in people’s surrounding environments. Even if it weren’t prohibitively expensive to point millions or billions of sensors at one person, that still wouldn’t be enough to predict everything. But tech companies and security agencies don’t have millions or billions of sensors pointed at each person. Compared to the entirety of what a person experiences and thinks, computer use patterns are a very sparse signal even for the most terminally online segment of the population (let alone your very offline grandma). Hence the YouTube algorithm flubbing something as basic as my religion—there’s just too much relevant information they don’t have access to.
Making people feel unsafe is a great way to lose users, and people during the mid 2010s did notice advertisements that predicted what they wanted before they gave any indication that they want it, and they did react in an extraordinary measurable way to that (quitting the platform). Even the most mundane systems you could possibly expect would notice a dynamic like that and default to playing conservatively and reducing risk e.g. randomly selecting ads. People who use LW or are unusually strongly involved with AI in other ways probably stopped getting truly targeted ads years ago (or even people with more than statistics 101), because with sample sizes in the millions you can anticipate that kind of problem from a mile away.
Likewise, if a social media platform makes you feel safe, that is basically not an indicator at all at how effective gradient descent is at creating user experiences that prevent quitting.
Can you go into more detail about your chaotic dynamics point? Chaotic dynamics don’t seem to prevent clown attacks from succeeding at a sufficiently high rate; and the whole point of the multi-armed bandit algorithms I’ve described is that any manipulation technique will have a decent failure rate; that redundancy is why using social media for an hour a day is dangerous but looking at a computer screen one time is safe. These systems require massive amounts of data on massive numbers of people in order to work, and that is what they’re getting (although I’m not saying that a gait camera couldn’t make a ton of probabilistic inferences about a person from only 10 seconds of footage).
If I had read your comment before writing the post, I wouldn’t have used the word “zero days” nearly so frequently in this post, or even at all, because you’re absolutely right that the exploits I’ve described here are very squishy and unreliable in a way that is very different from how zero days are generally understood.
There was that story about the girl that got ads for baby stuff before her parents knew about the pregnancy… ;)
Relevant smbc: “The best way to ruin a protest is to join it badly.”
Per the recent Nightshade paper, clown attacks would be a form of semantic poisoning on specific memeplexes, where ‘memeplex’ basically describes the architecture of some neural circuits. Those memeplexes at inference time would produce something designed to propagate themselves (a defence or description of some idea, submeme), and a clown attack would make that propagation less effective at transmitting to eg. specific audiences.
Possibly related: I think Yann LeCun is doing an excellent. Job of alerting people to the potential dangers of AI, by presenting conspicuously bad arguments to the effect of, “don’t worry, it’ll be fine.”
This post is way too long. Forget clown attacks, we desperately need LLMs that can protect us from verbosity attacks.
Fixed! Rerendered as an intuitive 4-minute read here: https://www.lesswrong.com/posts/aWPucqvJ4RWKKwKjH/4-min-read-an-intuitive-explanation-of-the-ai-influence
I wrote two shorter versions, one on the AI governance/US China side of things, and an even shorter summary of the overall situation.
I really regret making this post as long as EY’s List of Lethalities. I thought that the Cyborgism post was also long and got tons of reads because it was worth reading, so it would be fine, but it didn’t go like that. I still think the situation is an emergency.
What do you think—could AI-powered mind hacks be so powerful that will be itself an x-risk? For example AI generated messages dissolves person’s value system and core believes or even install AI on wetware?
Also, effective wireheading via AI-powered games etc is also a form of mind-hack.
Theoretically, it could be an X-risk, but it’s not neglected, the same way that climate change tipping points aren’t neglected. Maybe it could even reduce the required length of a pause by increasing the pace that our civilization approaches dath ilan or something like that. I’m mainly thinking about it as a roadmap for the 2020s, for any operation happening right now (e.g. AI alignment) that will probably take more than 10 years.
In the middle of the doc, I wrote:
Does this mean you think intelligence agencies and/or governments are deliberately promoting the degrowth movement in order to discredit the idea of AGI x-risk?
If so, why do you think they are doing that?
And how do you think they are doing that? (For example, is the CIA secretly funneling dark money to organizations that promote degrowth?)
No, sorry, I’m not making a prediction about that being true. I thought it was a helpful evocative example, which are generally helpful to transmit concepts (Yudkowsky does this).
Strongly agree. To my utter bewilderment, Eliezer appears to be exacerbating this vulnerability by making no efforts whatsoever to appear credible to the casual person.
In nearly all of his public showings in the last 2 years, he has:
Rocked up in a trilby
Failed to adequately introduce himself
Spoken in condescending, aloof and cryptic tones; and
Failed to articulate the central concerns in an intuitive manner
As a result, to the layperson, he comes off as an egotistical, pessimistic nerd with fringe views—a perfect clown from which to retreat to a “middle ground”, perhaps offered by the eminently reasonable-sounding Yann LeCun—who, after all, is Meta’s chief AI scientist.
The alignment community is dominated by introverted, cerebral rationalists and academics, and consequently, a common failure is to ignore the significance of image as either a distraction or an afterthought.