Is AI Safety dropping the ball on privacy?
The lack of privacy-preserving technologies facilitates better predictive models of human behavior. This accelerates several existential epistemic failure modes by enabling higher levels of deceptive and power-seeking capabilities in AI models.
What is this post about?
This post is not about things like government panopticons, hiding your information from ‘the public internet’, blurring your face on online videos, hiding from people who might look you up on Google or Facebook, or hackers getting access to your information, etc… While these issues might also be problematic, they do not pose x-risks in my mind.
This post is about things like surveillance capitalism or technofeudalism leading to an unregulatable and eventually uncontrollable Robust Agent Agnostic Process (RAAP). This causes an increasing disconnect between the perception of reality and true on-ground reality which ultimately leads to epistemic failure scenarios by enabling deceptive or power-seeking behavior to go unchecked for too long.
So overall, for right now all I am talking about is—potentially delaying timelines and shielding your mind against the current and future manipulation/deception of AI models by limiting the flow of information that can be tied to your unique identity.
Aren’t there bigger concerns?
Data privacy does not directly solve most problems. It only directly affects a subset of a specific scenario—The AI is trying to deceive you or change your preferences. If the AI wants to directly just straight up paperclip you then data privacy doesn’t really help much. However, it is another potential tool that we can use, similar to how the interpretability of DL models can help in the bigger broader alignment picture.
The reason I am writing this post is that I have observed that privacy seems to be relatively neglected in the space. There are near-negligible levels of concern about data privacy relative to other tools that are talked about as being helpful to the safety ecosystem. There are some conversations about—How much should an AI system be allowed to modify or influence your pre-existing preference distribution?, or, When do we cross the line from a model ‘informing’ us of better pathways to ‘deception’ to ‘preference modification’? but when it comes to the underlying nontechnical agent-agnostic processes like data gathering and hoarding that enable deceptive behavior I hear people saying things like—it is a concern for AI Ethics but not necessarily AI Safety since it does not pose x/s-risks.
I suppose at the end of the day people basically consider it a question of timelines. If you feel like we are going to see AGI before the end of the decade then there might be bigger fires to put out. Perhaps I have misunderstood the perspectives, but I will try to make a case for why today’s lack of privacy-preserving tech at the very least increases the ability of an AGI/ASI to either deceive us or change our preferences to align its preferences instead. This increases the risk of an existential failure both in the near and the long term. This means that it is directly influencing those ‘bigger concerns’ that you might already have and is thereby deserving of at least a little more attention from the alignment community.
What do I mean by epistemic failure?
So I need to pass on the vibe of what I am talking about when I use ‘epistemic failure’ in this post. I am generally trying to invoke some of the following kinds of scenarios:
Value-Lock-in: An AI might optimize and enforce a set of human values it has learned to be fitting for the present time. However, these values might not apply to all future generations of humans. A historical example: slavery was considered acceptable for most of history, but by current value standards it is reprehensible.
Value-Degradation: Human values and preference distributions are dynamic and since alignment works both ways, we are susceptible to being deceived and having our preference distribution changed to match whatever any other agent wants it to be. An AI agent gaining more influence over the world results in us being increasingly aligned with the AI’s preferences as opposed to vice versa.
Value-Drift: This is basically Paul Christiano’s whimper scenario. There is a slow continued loss of epistemic hygiene over time due to our reliance on proxies to measure reality. This is combined with a lack of desire to meaningfully act against or regulate AI because we are distracted by a cornucopia of wealth and AI-enabled products and services. Human reasoning gradually stops being able to compete with sophisticated, systematized manipulation and deception and we ultimately lose any real ability to influence our society’s trajectory. Past this point, the majority of human values are decided by AI models.
Deceptive AI: This is your run-of-the-mill inner misalignment/goal misgeneralization with goals extending beyond the training phase kind of scenario. It might be debatable if it is an ‘epistemic failure’ on the part of humans if we have an adversarial agent intentionally deceiving us. Either way, this failure scenario is still influenced by what I will talk about in this post.
How does a lack of privacy enable deception?
One of the underlying requirements of being manipulated is that the manipulator has to have a good simulation of you. Even when given only a limited amount of data about each other, humans can be very successful at manipulating each other’s preferences. Just this low-resolution simulation that humans have of each other has been enough to manipulate preference distributions to the extent that we can convince humans to kill each other, or even themselves en masse. What does this have to do with privacy? More data about you = a better simulation of you. So by constantly giving up data to companies we are openly consenting to being potentially deceived or manipulated both now and in the future.
As soon as a single bit of data is generated and transferred out of your local machine, it becomes insanely hard to actually have it ‘deleted’. The best you can hope for is it just gets removed from the public domain. But it still remains on internal databases because of either legal compliance reasons, or, because it simply got lost in the innards of the labyrinthian systems, which means even the developers have no clue where the data is anymore.
It doesn’t help that data is the lifeblood of most big-tech firms. So they tend to be notorious for knowingly muddying the waters in various ways. They do this because under no circumstances do they want to stem the flow of data. They instead want to convince you to let them be the custodians of your data. Why? because data is so damn valuable. One example is confusing the terms secure and private. Let’s use Google as an example. Look at this statement from google docs on ‘privacy’:
Keeping you safe online means protecting your information and respecting your privacy. That’s why, in every product we make, we focus on keeping your information secure, treating it responsibly, and keeping you in control. Our teams work every day to make Google products safe no matter what you’re doing—browsing the web, managing your inbox, or getting directions. … Your information is protected by world-class security. You can always control your privacy settings in your Google Account. …When you use Google Docs, Sheets, & Slides, we process some data to offer you better experiences in the product. Your information stays secure. …
I am not (as) worried about 2FA, encryption, and about my data getting hacked away from Google servers. I am also not (as) worried about you selling my data. I am extremely worried about you having a central database of my data at all.
More examples of induced confusion around the core issue include everything that I mentioned in the earlier paragraph detailing what is this post not about. You have no idea how often I have heard—”who do you think is trying to hack you?” as a response when I say I am concerned about the lack of online privacy. So please keep in mind that security and privacy are not the same things.
Anything that serves as a unique identifier—name, phone number, email address, IP, MAC, IMEI, browser fingerprint, etc… can all be used as foreign keys into another database. On a join of these databases, there is effectively a centralized repository of literally all your information public or private. The accumulated data in just a couple of the top few large central databases effectively gives whoever has access to it a complete understanding of your psyche. This controlling entity will have a better understanding of your motivations (and how to change them) than even you do. While no such super-database exists today (I hope) it would be trivial for an ASI to create such a database by either creating shell companies to broker deals with multiple large data providers, or just simply hacking the existing data centers.
The closer we get to a capabilities race, the higher the likelihood of a company throwing caution to the wind, and training on literally all the data they can get their hands on. This includes data that they have stored on their servers but is not open to the public. So just saying that ‘no humans have access’ to your data, or ‘it is not on the public internet’ does not address my concerns.
Do we have examples?
It has been almost a decade since Facebook itself came out with a paper giving empirical evidence that mass psychology can be manipulated using social networks. Since then recommendation algorithms have permeated and been embedded into almost every facet of our lives. BingGPT or Sydney or whatever you want to call it will just supercharge this existing problem.
People only see the cute RLHFd face of the shoggoth that is Sydney. These systems can already get people to fall in love with LLMs and have even people in AI Safety engineering saying stuff like ‘I’ll let her paperclip me’ and ‘I was not prepared for pre-superhuman AIs hacking through my psyche like butter.’ Now, combine this with the upcoming deluge of data due to mass scale deployment of IOT devices and integration of AI models directly into search engines. AI models are already being touted as potential alternatives to trusted relationships such as a therapist or mental health assistants. AI-Integrated Dating apps are asking increasingly invasive questions about your life to help better train ML recommendation models. The apps that use ML-based recommendations routinely outcompete others in the field that don’t.
Every single aspect of this is just making it easier to manipulate not just our preferences but our entire information environment to be whatever Sydney or Bert and Ernie (or any other future AI) decide that it should be. If this continues unabated in the current fashion, you better gird your loins because we are on a rollercoaster ride headed straight for the epistemic failure station.
Questions and Possible Solutions
So why is it that few if any people in AI Safety are advocates for privacy-preserving tech? At the very least we could try to mitigate the flood of data to the largest central repositories of information that are likely to be integrated into LLMs—Google, Microsoft, and Facebook.
Instead of trusting that companies don’t fall prey to temptation, we can just remove their ability to access data by using end-to-end encrypted services and taking responsibility for our own keys. Additionally, we can remove even more of the temptation by favoring distributed storage (e.g IPFS) combined with encryption and privacy through an I2P tunnel or something. Perhaps, advocate towards ensuring that we should be able to run (not train) all models locally (assuming we have the hardware) without sending prompts through the internet back to these companies.
None of these are concrete ideas yet, I am just writing/thinking out loud. Basically what I am trying to ask is—Are the claws of these technologies already so far ingrained that we cannot even possibly consider avoiding them? or is the loss of convenience/capabilities to everyday life too high, and people are making a conscious choice? or are they simply unaware of the problem? This at the least is an easy governance proposal to gain more time and reduce deceptive capabilities. Am I missing something...?
This question was inspired by the post by Nate Soares telling people to focus on places where others are dropping the ball, Paul Christiano’s What failure looks like, Andrew Critch’s RAAPs post, and by the webtoon comic on AGI failure Seed by Said P.
Robust because light cones/information bubbles/intentional deception robustly lead to people caring about utility functions that are either proxies susceptible to Goodhart’s law, or, are just plain wrong relative to what they truly care about. Agent Agnostic because it does not matter who executes the process, it could be a microoglebook or an ASI, or any other entity that just wanted to provide better recommendations.
We are not there yet, but it doesnt seem completely outlandish to imagine a world where you have genomics data from companies like 23andMe integrated with either online dating platforms or even digital id. An AI model uses this to recommend every relationship that you have in your life both platonic and non-platonic. Which is like effectively giving AI models control over humanity’s genetic future. Again just thinking of possible potential failures. We are nowhere close to this world state yet.
Google is an ad company that occasionally also provides search results. Combine this with the fact that the capability race is already underway, and that if they face any more ‘code-reds’, it is quite likely that they will use every single bit that they can get their hands on to train BARD if they can get away with it. Google only encrypts data in transit (SSL) and at rest. Basically, unless you are a business or enterprise customer and have actively chosen to take custody of your own keys, google owns the decryption keys to all data stored on Google servers. All this being said, I in no way mean to imply that it would be trivial to just decrypt and use all customer data. There is a huge amount of red tape and people internal to these organizations that would attempt to stop this (I think).
The Microsoft OS has gone as far as installing a keylogger on all systems by default.
New versions of Microsoft office are so bad at preserving privacy that they are literally banned from German schools.
Microsoft through their Copilot has also shown a willingness to use code that was under a copyleft license (GPL, MIT, AGPL, etc…) to train without open-sourcing their model.
Overall Microsoft seems to want ever more data and has tended towards increasing non-optional telemetry over the years. For a company like Microsoft, any lawsuits or fines are barely even a slap on the wrist for how much potential revenue they stand to gain for effectively gaining a permanent subscription from millions of developers who find CoPilot indispensable to their workflow. GPT-4 is going to be included in office suite products, combining telemetry data and feedback from every user of stuff like Word and Excel with stuff like copilot sending your code to Microsoft makes GPT-4/5 look quite scary.
I am just listing the essentials. You can add Amazon with their Stability foundation models, or, Baidu with ERNIE, etc… to the list if you want depending on your tolerance of loss to convenience.
I don’t think things like differential privacy or federated learning would affect the types of issues I am talking about but if you have arguments for it I would love to hear them.
Reminder: no, it didn’t, as you can see by scrolling the relevant WP section about impact (ie. consequences), not the irrelevant scandal section. We know from actual political advertising experiments their data was far too crude to make any impact (not that anything they said wasn’t a tissue of advertising & lies), and they weren’t even used by the Trump campaign!
Thanks for pointing that out! It’s embarrassing that I made a mistake, but it’s also relieving in some sense to learn that the impacts were not as I had thought them to be.
I hope this error doesn’t serve to invalidate the entire post. I don’t really know what the post-publishing editing etiquette is, but I don’t want to keep anything in the post that might serve as misinformation so I’ll edit this line out.
Please let me know if there are any other flaws you find and I’ll get them fixed.
I don’t think this invalidates the point that microtargeting can be very effective.
Strongly upvoted for the clear write-up, thank you for that, and engagement with a potentially neglected issue.
Following your post I’d distinguish two issues:
(a) Lack of data privacy enabling a powerful future agent to target/manipulate you personally, because your data is just there for the taking, stored in not-so-well-protected databases, cross-reference is easier at higher capability levels, singling you out and fine-tuning a behavioral model on you in particular isn’t hard ;
(b) Lack of data privacy enabling a powerful future agent to build that generic behavioral model of humans from the thousands/millions of well-documented examples from people who aren’t particularly bothered by privacy, from the same databases as above, plus simply (semi-)public social media records.
From your deception examples we already have strong evidence that (b) is possible. LLM capabilities will get better, and it will get worse when [redacted plausible scenario because my infohazard policies are ringing].
In (b) comes to pass, I would argue that the marginal effort needed to prevent (a) would only be useful to prevent certain whole coordinated groups of people (who should already be infosec-aware) to be manipulated. Rephrased: there’s already a ton of epistemic failures all over the place but maybe there can be pockets of sanity linked to critical assets.
I may be missing something as well. Also seconding the Seed webtoon recommendation.
I did consider the distinction between a model of humans vs. a model of you personally. But I can’t really see any realistic way of stopping the models from having better models of humans in general over time. So yeah, I agree with you that the small pockets of sanity are currently the best we can hope for. It was mainly to spread the pocket of sanity from infosec to the alignment space is why I wrote up this post. Because I would consider the minds of alignment researchers to be critical assets.
As to why predictive models of humans in general seems unstoppable—I thought it might be too much to ask to not even provide anonymized data because there are a lot of good capabilities that are enabled by that (e.g. better medical diagnoses). Even if it is not too heavy of a capability loss most people would still provide data because they simply don’t care or remain unaware. Which is why I used the wording—stem the flow of data and delay timelines instead of stopping the flow.
Does this mean we should stop making posts and comments on LessWrong?
I am trying to be as realistic as I can while realizing that privacy is inversely proportional to convenience.
So no, of course you should not stop making lesswrong posts.
The main things I suggested were—removing the ability to use data by favoring E2EE, and additionally removing the ability to hoard data, by favoring decentralized (or local) storage and computation.
As an example just favor E2EE services for collaborating instead of drive, dropbox, or office suite if you have the ability to do so. I agree that this doesn’t solve the problem but at least it gets people accustomed to thinking about using privacy-focused alternatives. So it is one step.
Another example would be using an OS which has no telemetry and gives you root access, both on your computer and on your smartphone.
There is a different class of suggestions that fall under digital hygiene in general, but as mentioned in the - ‘what is this post about’ section, that is not what this post is about. I am also intentionally avoiding naming the alternative services because I didn’t want this post to come across as a shill.
Also, this is all a question of timelines. If people think that AGI/ASI rears its head within the next years or decade, I would agree that there might be bigger fires to put out.
Let me explain my understanding of your model. An AI wants to manipulate you. To do that, it builds a model of you. It starts out with a probability distribution over the mind space that is its understanding of what human minds are like. Then, as it gathers information on you, it updates those probabilities. The more data it is given, the more accurate the model gets. Then it can model how you respond to a bunch of different stimuli and choose the one that gets the most desirable result.
But if this model is like any learning process I know about, the chart of how much is learned over time will probably look vaguely logarithmic, so once it is through half of the data, it will be way more than halfway through the improvement on the model. So if you’re thirty now, and have been using not end to end encrypted messaging your whole life and all that is on some server and ends up in an AI, you’ve probably already thrown away more than 90% of the game, whatever you do today. Especially since if you keep making public posts it can track changes in those to expect what changes are in you for its already good model anyway.
I keep going back and forth about whether your point is a good one or not. (Your point being that it’s useful to prevent non-public data about you from being easier to access by AIs on big server farms like Google’s or whatever, even if you haven’t been doing that so far, and you keep putting out public data.) Your idea sort of seems like a solution to another problem.
I do think your public internet presence will reveal much more to a manipulative AI than a manipulative human. AIs can make connections we can’t see. And a lot of AIs will be trained on just the internet as a whole, so while your private data may end up in a few AIs, or many if they gain the ability to hack, your public data will be in tons and tons of AIs. For say an LLM to work like this, it will have to be able to model your writing to optimize its reward. If you’re sufficiently different than other people that you need a unique way to be manipulated, your writing probably needs unique parameters to be optimized by an LLM. So if you have many posts online already, modern AIs probably already know you. And an AI optimized towards manipulation (either terminally or instrumentally) will just by talking to you or hearing you talk figure out who you are, or at least get a decent estimate of where you are in the mind space and already be saying whatever is most convincing to you. So when you post publicly, you are helping every AI manipulate you, but when you post privately, it only helps a few.
I understand your original comment a lot better now. My understanding of what you said is that open source intelligence that anyone provides through their public persona is revealing more than enough information to be damaging. The little that is sent over encrypted channels is just cherries on the cake. So the only real way to avoid manipulation is to first hope that you have not been a very engaged member of the internet for the last decade, and also primarily communicate over private channels.
I suppose I just underestimated how much people actually post stuff online publicly.
One first instinct response I had was identity isolation. That was something I was going to suggest while writing the original post as well. Practicing identity isolation would mean that even if you post anything publicly the data is just isolated to that identity. Every website, every app, is either compartmentalized or is on a completely different identity. Honestly, though that requires almost perfect OPSEC to not be fingerprintable. Besides just that, it’s also way too inconvenient for people to not just use the same email, and phone number or just log in with google everywhere. So even though I would like to suggest it, no one would actually do it. And as you already mentioned most normal people have just been providing boatloads of free OSINT for the last few decades anyway...
Thinking even more, the training dataset over the entire public internet is basically the centralized database of your data that I am worried about anyway. As you mentioned AIs can find feature representations that we as humans cant. So basically even if you have been doing identity isolation, LLMs (or whatever future model) would still be able to fingerprint you as long as you have been posting enough stuff online. And not posting online is not something that most people are willing to do. Even if they are, they have already given away the game by what they have already posted. So in a majority of cases, identity isolation doesn’t help this particular problem of AI manipulation either...
I have always tried to hold the position that even if it might be possible for other people (or AIs) to do something you don’t like (violate privacy/manipulate you), that doesn’t mean you should give up or that you have to make it easy for them. But off the top of my head, without thinking about this some more I can’t really come up with any good solution for people who have been publicly publishing info on the internet for a while. Thank you for giving me food for thought.