Is AI Safety dropping the ball on privacy?


The lack of privacy-preserving technologies facilitates better predictive models of human behavior. This accelerates several existential epistemic failure modes by enabling higher levels of deceptive and power-seeking capabilities in AI models.

What is this post about?

This post is not about things like government panopticons, hiding your information from ‘the public internet’, blurring your face on online videos, hiding from people who might look you up on Google or Facebook, or hackers getting access to your information, etc… While these issues might also be problematic, they do not pose x-risks in my mind.

This post is about things like surveillance capitalism or technofeudalism leading to an unregulatable and eventually uncontrollable Robust Agent Agnostic Process (RAAP)[1]. This causes an increasing disconnect between the perception of reality and true on-ground reality which ultimately leads to epistemic failure scenarios by enabling deceptive or power-seeking behavior to go unchecked for too long.

So overall, for right now all I am talking about is—potentially delaying timelines and shielding your mind against the current and future manipulation/​deception of AI models by limiting the flow of information that can be tied to your unique identity.

Aren’t there bigger concerns?

Data privacy does not directly solve most problems. It only directly affects a subset of a specific scenario—The AI is trying to deceive you or change your preferences. If the AI wants to directly just straight up paperclip you then data privacy doesn’t really help much. However, it is another potential tool that we can use, similar to how the interpretability of DL models can help in the bigger broader alignment picture.

The reason I am writing this post is that I have observed that privacy seems to be relatively neglected in the space. There are near-negligible levels of concern about data privacy relative to other tools that are talked about as being helpful to the safety ecosystem. There are some conversations about—How much should an AI system be allowed to modify or influence your pre-existing preference distribution?, or, When do we cross the line from a model ‘informing’ us of better pathways to ‘deception’ to ‘preference modification’? but when it comes to the underlying nontechnical agent-agnostic processes like data gathering and hoarding that enable deceptive behavior I hear people saying things like—it is a concern for AI Ethics but not necessarily AI Safety since it does not pose x/​s-risks.

I suppose at the end of the day people basically consider it a question of timelines. If you feel like we are going to see AGI before the end of the decade then there might be bigger fires to put out. Perhaps I have misunderstood the perspectives, but I will try to make a case for why today’s lack of privacy-preserving tech at the very least increases the ability of an AGI/​ASI to either deceive us or change our preferences to align its preferences instead. This increases the risk of an existential failure both in the near and the long term. This means that it is directly influencing those ‘bigger concerns’ that you might already have and is thereby deserving of at least a little more attention from the alignment community.

What do I mean by epistemic failure?

So I need to pass on the vibe of what I am talking about when I use ‘epistemic failure’ in this post. I am generally trying to invoke some of the following kinds of scenarios:

  • Value-Lock-in: An AI might optimize and enforce a set of human values it has learned to be fitting for the present time. However, these values might not apply to all future generations of humans. A historical example: slavery was considered acceptable for most of history, but by current value standards it is reprehensible.

  • Value-Degradation: Human values and preference distributions are dynamic and since alignment works both ways, we are susceptible to being deceived and having our preference distribution changed to match whatever any other agent wants it to be. An AI agent gaining more influence over the world results in us being increasingly aligned with the AI’s preferences as opposed to vice versa.

  • Value-Drift: This is basically Paul Christiano’s whimper scenario. There is a slow continued loss of epistemic hygiene over time due to our reliance on proxies to measure reality. This is combined with a lack of desire to meaningfully act against or regulate AI because we are distracted by a cornucopia of wealth and AI-enabled products and services. Human reasoning gradually stops being able to compete with sophisticated, systematized manipulation and deception and we ultimately lose any real ability to influence our society’s trajectory. Past this point, the majority of human values are decided by AI models.

  • Deceptive AI: This is your run-of-the-mill inner misalignment/​goal misgeneralization with goals extending beyond the training phase kind of scenario. It might be debatable if it is an ‘epistemic failure’ on the part of humans if we have an adversarial agent intentionally deceiving us. Either way, this failure scenario is still influenced by what I will talk about in this post.

How does a lack of privacy enable deception?

One of the underlying requirements of being manipulated is that the manipulator has to have a good simulation of you. Even when given only a limited amount of data about each other, humans can be very successful at manipulating each other’s preferences. Just this low-resolution simulation that humans have of each other has been enough to manipulate preference distributions to the extent that we can convince humans to kill each other, or even themselves en masse. What does this have to do with privacy? More data about you = a better simulation of you. So by constantly giving up data to companies we are openly consenting to being potentially deceived or manipulated both now and in the future.

As soon as a single bit of data is generated and transferred out of your local machine, it becomes insanely hard to actually have it ‘deleted’. The best you can hope for is it just gets removed from the public domain. But it still remains on internal databases because of either legal compliance reasons, or, because it simply got lost in the innards of the labyrinthian systems, which means even the developers have no clue where the data is anymore.

It doesn’t help that data is the lifeblood of most big-tech firms. So they tend to be notorious for knowingly muddying the waters in various ways. They do this because under no circumstances do they want to stem the flow of data. They instead want to convince you to let them be the custodians of your data. Why? because data is so damn valuable. One example is confusing the terms secure and private. Let’s use Google as an example. Look at this statement from google docs on ‘privacy’:

Keeping you safe online means protecting your information and respecting your privacy. That’s why, in every product we make, we focus on keeping your information secure, treating it responsibly, and keeping you in control. Our teams work every day to make Google products safe no matter what you’re doing—browsing the web, managing your inbox, or getting directions. … Your information is protected by world-class security. You can always control your privacy settings in your Google Account. …When you use Google Docs, Sheets, & Slides, we process some data to offer you better experiences in the product. Your information stays secure. …

I am not (as) worried about 2FA, encryption, and about my data getting hacked away from Google servers. I am also not (as) worried about you selling my data. I am extremely worried about you having a central database of my data at all.

More examples of induced confusion around the core issue include everything that I mentioned in the earlier paragraph detailing what is this post not about. You have no idea how often I have heard—”who do you think is trying to hack you?” as a response when I say I am concerned about the lack of online privacy. So please keep in mind that security and privacy are not the same things.

Anything that serves as a unique identifier—name, phone number, email address, IP, MAC, IMEI, browser fingerprint, etc… can all be used as foreign keys into another database. On a join of these databases, there is effectively a centralized repository of literally all your information public or private. The accumulated data in just a couple of the top few large central databases effectively gives whoever has access to it a complete understanding of your psyche. This controlling entity will have a better understanding of your motivations (and how to change them) than even you do. While no such super-database exists today (I hope) it would be trivial for an ASI to create such a database by either creating shell companies to broker deals with multiple large data providers, or just simply hacking the existing data centers.

The closer we get to a capabilities race, the higher the likelihood of a company throwing caution to the wind, and training on literally all the data they can get their hands on. This includes data that they have stored on their servers but is not open to the public. So just saying that ‘no humans have access’ to your data, or ‘it is not on the public internet’ does not address my concerns.

Do we have examples?

It has been almost a decade since Facebook itself came out with a paper giving empirical evidence that mass psychology can be manipulated using social networks. Since then recommendation algorithms have permeated and been embedded into almost every facet of our lives. BingGPT or Sydney or whatever you want to call it will just supercharge this existing problem.

People only see the cute RLHFd face of the shoggoth that is Sydney. These systems can already get people to fall in love with LLMs and have even people in AI Safety engineering saying stuff like I’ll let her paperclip me and I was not prepared for pre-superhuman AIs hacking through my psyche like butter. Now, combine this with the upcoming deluge of data due to mass scale deployment of IOT devices and integration of AI models directly into search engines. AI models are already being touted as potential alternatives to trusted relationships such as a therapist or mental health assistants. AI-Integrated Dating apps are asking increasingly invasive questions about your life to help better train ML recommendation models. The apps that use ML-based recommendations routinely outcompete others in the field that don’t.[2]

Every single aspect of this is just making it easier to manipulate not just our preferences but our entire information environment to be whatever Sydney or Bert and Ernie (or any other future AI) decide that it should be. If this continues unabated in the current fashion, you better gird your loins because we are on a rollercoaster ride headed straight for the epistemic failure station.

Questions and Possible Solutions

So why is it that few if any people in AI Safety are advocates for privacy-preserving tech? At the very least we could try to mitigate the flood of data to the largest central repositories of information that are likely to be integrated into LLMs—Google[3], Microsoft[4], and Facebook[5].

Instead of trusting that companies don’t fall prey to temptation, we can just remove their ability to access data by using end-to-end encrypted services and taking responsibility for our own keys[6]. Additionally, we can remove even more of the temptation by favoring distributed storage (e.g IPFS) combined with encryption and privacy through an I2P tunnel or something. Perhaps, advocate towards ensuring that we should be able to run (not train) all models locally (assuming we have the hardware) without sending prompts through the internet back to these companies.

None of these are concrete ideas yet, I am just writing/​thinking out loud. Basically what I am trying to ask is—Are the claws of these technologies already so far ingrained that we cannot even possibly consider avoiding them? or is the loss of convenience/​capabilities to everyday life too high, and people are making a conscious choice? or are they simply unaware of the problem? This at the least is an easy governance proposal to gain more time and reduce deceptive capabilities. Am I missing something...?


This question was inspired by the post by Nate Soares telling people to focus on places where others are dropping the ball, Paul Christiano’s What failure looks like, Andrew Critch’s RAAPs post, and by the webtoon comic on AGI failure Seed by Said P.

  1. ^

    Robust because light cones/​information bubbles/​intentional deception robustly lead to people caring about utility functions that are either proxies susceptible to Goodhart’s law, or, are just plain wrong relative to what they truly care about. Agent Agnostic because it does not matter who executes the process, it could be a microoglebook or an ASI, or any other entity that just wanted to provide better recommendations.

  2. ^

    We are not there yet, but it doesnt seem completely outlandish to imagine a world where you have genomics data from companies like 23andMe integrated with either online dating platforms or even digital id. An AI model uses this to recommend every relationship that you have in your life both platonic and non-platonic. Which is like effectively giving AI models control over humanity’s genetic future. Again just thinking of possible potential failures. We are nowhere close to this world state yet.

  3. ^

    Google is an ad company that occasionally also provides search results. Combine this with the fact that the capability race is already underway, and that if they face any more ‘code-reds’, it is quite likely that they will use every single bit that they can get their hands on to train BARD if they can get away with it. Google only encrypts data in transit (SSL) and at rest. Basically, unless you are a business or enterprise customer and have actively chosen to take custody of your own keys, google owns the decryption keys to all data stored on Google servers. All this being said, I in no way mean to imply that it would be trivial to just decrypt and use all customer data. There is a huge amount of red tape and people internal to these organizations that would attempt to stop this (I think).

  4. ^

    The Microsoft OS has gone as far as installing a keylogger on all systems by default.

    New versions of Microsoft office are so bad at preserving privacy that they are literally banned from German schools.

    Microsoft through their Copilot has also shown a willingness to use code that was under a copyleft license (GPL, MIT, AGPL, etc…) to train without open-sourcing their model.

    Overall Microsoft seems to want ever more data and has tended towards increasing non-optional telemetry over the years. For a company like Microsoft, any lawsuits or fines are barely even a slap on the wrist for how much potential revenue they stand to gain for effectively gaining a permanent subscription from millions of developers who find CoPilot indispensable to their workflow. GPT-4 is going to be included in office suite products, combining telemetry data and feedback from every user of stuff like Word and Excel with stuff like copilot sending your code to Microsoft makes GPT-4/​5 look quite scary.

  5. ^

    I am just listing the essentials. You can add Amazon with their Stability foundation models, or, Baidu with ERNIE, etc… to the list if you want depending on your tolerance of loss to convenience.

  6. ^

    I don’t think things like differential privacy or federated learning would affect the types of issues I am talking about but if you have arguments for it I would love to hear them.