Do not be surprised if LessWrong gets hacked
Or, for that matter, anything else.
This post is meant to be two things:
a PSA about LessWrong’s current security posture, from a LessWrong admin[1]
an attempt to establish common knowledge of the security situation it looks like the world (and, by extension, you) will shortly be in
Claude Mythos was announced yesterday. That announcement came with a blog post from Anthropic’s Frontier Red Team, detailing the large number of zero-days (and other security vulnerabilities) discovered by Mythos.
This should not be a surprise if you were paying attention—LLMs being trained on coding first was a big hint, the labs putting cybersecurity as a top-level item in their threat models and evals was another, and frankly this blog post maybe could’ve been written a couple months ago (either this or this might’ve been sufficient). But it seems quite overdetermined now.
LessWrong’s security posture
In the past, I have tried to communicate that LessWrong should not be treated as a platform with a hardened security posture. LessWrong is run by a small team. Our operational philosophy is similar to that of many early-stage startups. We treat some LessWrong data as private in a social sense, but do not consider ourselves to be in the business of securely storing sensitive information. We make many choices and trade-offs in the direction that marginally favor speed over security, which many large organizations would make differently. I think this is reasonable and roughly endorse the kinds of trade-offs we’re making[2].
I think it is important for you to understand the above when making decisions about how to use LessWrong. Please do not store highly sensitive information in LessWrong drafts, or send it to other users via LessWrong messages, with the expectation that LessWrong will be robust to the maybe-upcoming-wave-of-scaled-cyberattacks.
LessWrong is not a high-value target
While LessWrong may end up in the affected blast radius simply due to its nature as an online platform, we do not store the kind of user data that cybercriminals in the business of conducting scaled cyberattacks are after. The most likely outcome of a data breach is that the database is scanned (via automated tooling) for anything that looks like account credentials, crypto wallet keys, LLM inference provider API keys, or similar. If you have ever stored anything like that in a draft post or sent it to another user via LessWrong DM, I recommend cycling it immediately.
It is possible that e.g. an individual with a grudge might try to dig up dirt on their enemies. I think this is a pretty unlikely threat model even if it becomes tractable for a random person to point an LLM at LessWrong and say “hack that”. In that world, I do expect us (the LessWrong team) to clean up most of the issues obvious to publicly-available LLMs relatively quickly, and also most people with grudges don’t commit cybercrime about it.
Another possibility is that we get hit by an untargeted attack and all the data is released in a “public” data dump. It’s hard to get good numbers for this kind of thing, but there’s a few reasons for optimism[3] here:
From what I could find, probably well under half of data breaches result in datasets that get publicly circulated in any meaningful sense.
Many of those that do are “for sale”, not freely available. Someone with a chip on their shoulder might download a freely available dataset, but is much less likely to spend money on it (and also risk the eye of the state, if they then try to use that purchased data for anything untoward).
Datasets like this often don’t ever really “go away”, but they often do become unavailable, especially if they’re large. Storage is expensive, hosting sites generally take them down on request, torrenting is risky, and there isn’t much motive to keep re-uploading terabytes of data that you aren’t even selling. (Monetizable datasets tend to be stripped down and much smaller, but also wouldn’t include approximately any of the information that you might be concerned about here.)
FAQ
What “private” data of mine could be exposed in a breach?
Your email address(es)
A hashed version of your password
Your previous display name, if you’ve changed it (not technically a secret)
Analytics data about e.g. what pages you’ve visited on LessWrong, and in some cases what you’ve clicked on
Any information that may have come from your OAuth providers (Google, Github, Facebook)
Messages to other users
Draft posts and comments
Deleted comments
Draft revisions of published posts
Your frontpage tag filter settings
Your voting history
Your location data (if you provided it for e.g. being notified of nearby events)
Posts you’ve read
Your bookmarks
Posts you’ve hidden
Information you’ve given us to enable us to pay you money (if you provided it for e.g. Goodhart Tokens), such as a dedicated Paypal email address. (We do not store any e.g. credit card information that you would use to pay us money.)
Your notifications
Your account’s moderation history (if any)
Actions you’ve taken in previous Petrov Days
Your user agent and referer
Any messages you’ve sent to LLMs via one of the two embedded LLM chat features we’ve built, and responses received
Probably other things that aren’t coming to mind, though I’m pretty sure I’ve covered the big ones above. If you’re curious, our codebase is open source; you’re welcome to examine it yourself (or sic your own LLM on it).
Can I delete my data?
No*. Nearly all of the data we store is functional. It would take many engineer-months to refactor the codebase to support hard-deletion of user data (including across backups, which would be required for data deletion to be “reliable” in the case of a future data breach), and this would also make many site features difficult or impractical to maintain in their current states. Normatively, I think that requests for data deletion are often poorly motivated and impose externalities on others[4]. Descriptively, I think that most requests for data deletion from LessWrong would be mistakes if they were generated by concerns about potential data breaches. Separately, most data deletion requests often concern publicly-available data (such as published posts and comments) which are often already captured by various mirrors and archives, and we don’t have the ability to enforce their deletion. I’ll go into more detail on my thinking on some of this in the next section of the post.
* If you are a long-standing site user and think that you have a compelling case for hard-deleting a specific piece of data, please feel free to message us, but we can’t make any promises about being able to allocate large amounts of staff time to this. e.g. we may agree to delete your DMs, after giving other conversation participants time to take their own backups.
Is LessWrong planning on changing anything?
We have no immediate plans to change anything. There might be a threshold which the cost of auditing our own codebase can fall under that would motivate us to conduct a dedicated audit, but we are not quite there yet[5].
The Broader Situation
Epistemic status: I am not a security professional. I am a software engineer who has spent more time thinking about security than the median software engineer, but maybe not the 99th percentile. This section necessarily requires some extrapolation into the uncertain future.
A proper treatment of “what’s about to happen” really deserves its own post, ideally by a subject-matter expert (or at least someone who’s spent quite a bit more time on thinking about this question than I have). I nonetheless include some very quick thoughts below, mostly relevant to US-based individuals that don’t have access to highly sensitive corporate secrets[6] or classified government information.
Many existing threat models don’t seem obviously affected by the first-order impacts of a dramatic increase in scalable cyber-offensive capabilities. Four threat models which seem likely to get worse are third-party data breaches, software supply chain attacks, ransomware, and cryptocurrency theft.
I’m not sure what to do about data breaches, in general. The typical vector of exploitation is often various forms of fraud involving identity theft or impersonation, but scaled blackmail campaigns[7] wouldn’t be terribly shocking as a “new” problem. One can also imagine many other problems cropping up downstream of LLMs providing scalable cognition, enabling many avenues of value extraction that were previously uneconomical due to the sheer volume of data. If you’re worried about identity theft, set up a credit freeze[8]. Behave virtuously. If you must behave unvirtuously, don’t post evidence of your unvirtuous behavior on the internet, not even under a very anonymous account that you’re sure can’t be linked back to you.
Software supply chain attacks seem less actionable if you’re not a software engineer. This is already getting worse and will probably continue to get worse. Use a toolchain that lets you pin your dependencies, if you can. Wait a few days after release before upgrading to the newest version of any dependency. There are many other things you can do here; they might or might not pass a cost-benefit analysis for individuals.
Scaled ransomware
Everybody is already a target. They want your money and will hold the contents of your computer hostage to get it.
This probably gets somewhat worse in the short-term with increased cybersecurity capabilities floating around. The goal of the attacker is to find a way to install ransomware on your computer. Rapidly increasing cybersecurity capabilities differentially favor attackers since there are multiple defenders and any one of them lagging behind is often enough to enable marginal compromises[9].
To date, scaled ransomware campaigns of the kind that extort large numbers of individuals out of hundreds or thousands of dollars apiece have not been trying to delete (or otherwise make inaccessible) backups stored in consumer backup services like Backblaze, etc[10]. My current belief is that this is mostly a contingent fact about the economic returns of trying to develop the relevant feature-set, rather than due to any fundamental difficulty of the underlying task.
As far as I can tell, none of the off-the-shelf consumer services like this have a feature that would prevent an attacker with your credentials from deleting your backups immediately. Various companies (including Backblaze) offer a separate object storage service, with an object lock feature that prevents even the account owner from deleting the relevant files (for some period of time), but these are not off-the-shelf consumer services and at that point you’re either rolling your own or paying a lot more (or both).
If you are concerned about the possibility of losing everything on your computer because of ransomware[11], it is probably still worth using a service like this. The contingent fact of scaled ransomware campaigns not targeting these kinds of backups may remain true. Even if it does not remain true, there are some additional things you should do to improve your odds:
Set your 2fa method to rely on TOTP, not a code sent by email or SMS.
Do not install the app generating TOTPs on your computer.
Do not check “Remember this browser” when entering your 2fa code to sign in to their website. If you’ve already done that, delete all the cookies in your browser for the relevant domains.
This increases the number of additional security boundaries the ransomware would need to figure out how to violate, in order to mess with your backups.
Scaled cryptocurrency theft
Everybody is already a target (since the attackers don’t know who might own cryptocurrency), but this mostly doesn’t matter if you don’t own cryptocurrency. The threat model here is similar to the previous one, except the target is not necessarily your computer’s hard drive, but anywhere you might be keeping your keys. I am not a cryptocurrency expert and have not thought about how I would safely custody large amounts[12] of cryptocurrency. Seems like a hard problem. Have you considered not owning cryptocurrency?
My extremely tentative, low-confidence guess is that for smaller amounts you might just be better off tossing it all into Coinbase. Third-party wallets seem quite high-risk to me; their security is going to be worse and you’ll have fewer options for e.g. recovery from equity holders after a breach. Self-custody trades off against other risks (like losing your keys). But this is a question where you can probably do better than listening to me with a couple hours of research, if you’re already in a position where it matters to you.
All of these probably deserve fuller treatments.
Habryka broadly endorses the contents of the LessWrong’s security posture section. Instances of the pronoun “we” in this post should generally be understood to mean “the members of the Lightcone team responsible for this, whatever this is”, rather than “the entire Lightcone team”. I’ll try to be available to answer questions in the comments (or via Intercom); my guess is that Habryka and Jim will also be around to answer some questions.
- ^
Me!
- ^
I won’t vouch for every single individual one, not having thought carefully enough about every single such choice to be confident that I would endorse it on reflection. Many such cases.
- ^
Which unfortunately are contingent on details of the current environment.
- ^
Though I won’t argue for that claim in this post, and it’s not load-bearing for the decision.
- ^
If you think you are qualified to do this (and are confident that you won’t end up spamming us with false-positives), please message us on Intercom or email us at team@lesswrong.com. We do not have a bug bounty program. Please do not probe our production APIs or infrastructure without our explicit consent. We are not likely to respond to unsolicited reports of security issues if we can’t easily verify that you’re the kind of person who’s likely to have found a real problem, or if your report does not include a clear repro.
- ^
This does unfortunately exclude many likely readers, since it includes lab employees, and also employees of orgs that receive such information from labs, such as various evals orgs.
- ^
We technically already have these, but they’re often targeting the subset of the population that is afraid of the attacker telling their friends and family that they e.g. watch pornography, which the attacker doesn’t actually know to be true (though on priors...) and also won’t do since they don’t know who your friends and family are. These attacks can become much scarier to a much larger percentage of the population, since personalization can now be done in an substantially automated way.
- ^
This won’t help with e.g. fraud against government agencies, or anything other than attackers opening financial accounts in your name.
- ^
This is not intended as a complete argument for this claim.
- ^
This is not the case for things like OneDrive/Dropbox/Google Drive, where you have a “sync” folder on your machine. It is also not the case for targeted ransomware attacks on large organizations of the kind that ask for 6-7 figures; those are generally bespoke operations and go through some effort to gain access to all of backups before revealing themselves, since the backups are a threat to the entire operation.
- ^
Or hardware failure, or theft of your computer, or many other possibilities. But the further advice is specific to the ransomware case.
- ^
I’m not sure when the “hunt you down in person”-level attacks start. Maybe six figures? At any rate, don’t talk about your cryptocurrency holdings in public.
Are there air gapped backups, in case, say, someone with a grudge against rationalists or EAs decided to take Lightcone down and try to destroy every record they can? It would really suck to lose the whole history of LW. I don’t know what mirrors exist, or how vulnerable they might be to a determined attacker.
There are many many backups of public content (including things like archive.org and archive.is and other people who have taken their own backups).
I don’t think we have any air-gapped backups of private content, though I am sure I have some random old DB backups lying around in some random cloud drives somewhere, or an old laptop of mine.
I encourage anyone with files they’d rather not lose (photos, taxes, passwords, etc.) to start making rotating offline backups. Find some big enough USB drives (flash or spinning are both fine) and buy ~5. Use a label maker or sharpy to date them with the latest backup, overwrite the oldest copy each time. Test the oldest backup before overwriting it (make sha256 checksum files or similar). Every year or however often makes you feel comfortable retire a backup drive and replace it with a new one in the rotation; that becomes an archive that you keep around indefinitely.
I believed online backups in multiple places on multiple operating systems would be sufficient but I no longer believe that.
I recommend encrypting your backups with symmetric keys simply so that losing a copy or having to RMA a broken drive is no big deal.
This seems too complacent to me. Any long-lived social media or communications utility should have some data retention policies which reduce the blast radius of an exploit and turn them into less of an endlessly growing radioactive waste dump of PII. I think this is especially true given how many people on LW have gone on to important positions or roles later in life (including in, say, cryptocurrency − 100% sufficient justification for meaningful hacking efforts); and remember the West Anglia or Hillary or Epstein emails, how badly even the most innocent communication could be abused by fanatics or fools or fraudsters? (I’ve been struck by how many of the ‘Epstein emails’ doing huge numbers on social media aren’t even real, and legitimated solely by the fact of a leak. In the postmodern oral culture, who bothers to factcheck anything, or so much as include a URL?)
Given how serious Mythos seems to be, and that information leaks are irreversible and the fact that it’s only going to escalate (remember, there’s usually a <=1 year lag from the best proprietary to opensource, so we may not even have until 2027 before mass attacks with zero guard rails or potential observability), it seems to me like this is a good time to implement some maximum retention period for DMs, and purge all old DMs. I would suggest something like, announce via email to people with any DMs that all pre-2026 DMs will be deleted within one month, and attach an export, and that forthgoing, all DMs will be deleted after 1 years of inactivity.
(Airgapped LW2 backups should go without saying and already exist!)
I haven’t thought about the tradeoffs here that much, but I would be very sad if I was a user on LessWrong, who forgot about the site for 1-2 years, and I come back expecting to find all my old DMs but instead they are all deleted. I expect all online services to use to keep my data and to not delete it, and I actively avoid any that don’t do that. I do not want to be in the habit of taking my own backups of all services I use.
That is why I said “and attach an export”.*
And personally I would rather a website delete my DMs than release them to the world. This is probably true of most of the people my DMs are with (whose opinion also matters).
* my reasoning here is that if old DMs have to live anywhere besides airgapped physically-secured encrypted backups, highly dispersed email accounts are the safest place because the main email providers are, in general, vastly more secure than LW2 is, and better equipped to respond rapidly to hacks, as well as extensive controls to limit exfiltration; they all have early access to Mythos-class models to reduce damage early; and they are ‘too big to fail’ in the sense that if something like Gmail is cracked wide open and leaked, it will likely be such a global cataclysm that people really won’t be able to abuse LW-related parts especially badly.
Ooh, interesting, I did fail to properly parse that you suggested directly attaching a DM export to the email. Yeah, that makes this less costly, though IMO still too annoying for anything I would want to use (of course I would prefer this over my DMs getting broadcast to the world, but really in almost any future I can see, the probability of anything like that still stays below 5%).
Compromise: if various other platforms experience the equivalent of DMs leaking (or you otherwise update that it’s >>5%), quickly do the gwern plan?
Something like that seems pretty reasonable.
You could keep any export that nobody downloaded in the airgapped archives, against some future day when you find a better point on the tradeoff curve.
There’s going to be so much of this over the coming years that I’m guessing people will be desensitized and stop giving a hoot.
That is a misunderstanding of how it works. They won’t ‘stop giving a hoot’ because it remains a useful weapon.
Not sure I understand you. So for example the US public seems pretty desensitized to the US executive having constant scandals and being blatantly corrupt, and that’s basically one big important actor doing lots of clearly really bad stuff. If there’s a constant flood of private communications being leaked I’d guess people would get really desensitized to it (as well as any particular leak being drowned out by the rest of the flood). So it would get less useful as a weapon because the public wouldn’t give as much of a hoot, is what I was trying to say.
Also noticing I shifted the goalposts here from “stop giving a hoot” to “wouldn’t give as much of a hoot”. I concede the original point as worded.
Example that maps better: people probably care less about things said by someone that aged poorly, since there’s been a huge flood of such things due to social media.
Doing a Carlini-style vulnerability analysis would seem relatively low-effort if you haven’t done that already.
We have to do something a little more annoying than that, since we don’t have unlimited (and un-rate-limited) Claude Code usage, but something like that is happening.
I don’t necessarily disagree with this, but this is a new and relatively unproven method which is only a partial solution to a much wider task. And IMO it’s overly specific for the general problem at hand.
Security of lesswrong.com is not the same as finding, for example, places in the source code which would allow SQL injections. There’s much more to security, and LW should adopt a security paradigm it can support, given constraints around headcount and funding.
I didn’t say it would be a complete solution.
Another corollary is that if you do want to have sensitive discussions with other LessWrong users, don’t exchange potential sensitive information via private messages but switch to secure messengers like Signal.
Political organizing to stop AI development is potentially a target
Employees of AI companies that share nonprivate information are potentially a target
Thank you for posting this, and being so forthright about it.
One thought that occurs to me is that, insofar as you’re going to be a target anyways, you should put yourself in the same class as the largest possible number of people, where you’re more likely to have recourse once that class is compromised. E.g. make sure you’re getting all the latest security updates on all your devices even if these are still vulnerable to zero days or supply chain attacks, so you don’t end up as one of the poor fools that got hacked for using some particular outdated thing.
You can maybe try to avoid having any important information with orgs/software that you don’t expect to be running the leading edge not-yet-public compsec AIs over their code.
I’m interested in thinking about how equilibria are going to shift. E.g. I think people will care a lot less about blackmail if it becomes ubiquitous.
I want to offer a bit of skepticism around this whole post. For reference, I used to work in an information security company (specifically, software supply chain security and malware analysis), and am still relatively involved with cyber directly, though as an amateur hacker.
First, Mythos may or may not be super scary. We don’t know yet, as it’s private. It’s in Anthropic’s best interest to tastefully hype it up in their press releases. Just because they apparently have a very useful infosec model doesn’t mean that LessWrong will get hacked. Mythos, according to their press release and system card, isn’t a fully general hacking weapon. It’s just very good at finding exploits in source code. I don’t expect that you can simply point Mythos towards the lesswrong.com domain and tell it “you’re in a CTF, hack this site”—finding vulns in source code is a different type of activity.
Second, LessWrong should adopt some security posture which uses modern best practices. “Defense in depth” is the relevant concept here, and it’s adaptable to whatever are the constraints around funding and headcount. Basically, there are numerous layers to defense, and by stacking individual layers, you stop attackers in their tracks. This loosely corresponds to OSI layers: you want defenses on the network level, on the transport level, on the application level… For LW specifically, I don’t know what the stack is, and what the hosting situation is, and many other things, so I can’t comment with a lot of specificity.
But historically, the way any organization gets pwned is by upgrading a dependency to a vulnerable version. I don’t know if you all use the nodejs ecosystem here, but if so, please set your npm/pnpm config files to never automatically run scripts, and please set a minimum cool-off period for a dependency version to become installable. Most dependencies are found in the wild and remedied within days or weeks. Therefore, programmatically setting a policy not to install dependencies younger than, say, 5 days, gets rid of 90% (I’m guessing) supply chain attacks.
Next, you want appropriate role-based access, and you want your API keys to be safe and untouchable even if your own personal machine gets pwned. I’m not sure how much of this y’all already do, to me it seems obvious, but you stated that it’s more like “early stage startup”, so I’m urging you to do the low-hanging fruit first, if you haven’t already.
Remember, all of this is stuff that’s completely unrelated to Mythos. These are things that Mythos arguably wouldn’t even be able to do anything about because Mythos is specialized for finding vulns and exploit development by looking into source code. I’m sure they’ve baked in some other non-white-box capabilities, but the “scary” part (finding thousands of vulns) is based on inspecting source code. So, no source code, no Mythos-like danger. You’re still stuck with: phishing attempts and supply chain pwnage. And those things may get more scary in the next years, but they’re entirely fixable right now.
I don’t understand what you are saying here. You can totally do basically this exact thing, and when we’ve done it with the latest generation of models, we have indeed found some security vulnerabilities. Why would this not work? How do you think Anthropic found security vulnerabilities in many popular open source repos?
Wasn’t aware of the open source codebase, my bad.
My point more broadly was: you cannot point it to <some domain> and magically hack it.
But yeah, if you have everything open-sourced, then it’s much easier to find source code that contains vulnerabilities such that they would allow RCE.
Note that lesswrong.com is open source, which can be easily found by googling “lesswrong.com github”.
I wasn’t aware of the open source codebase, my bad
You might not be aware LW is open-source?
Wasn’t aware, yup
I created a market related to some of this post’s predictions: https://manifold.markets/distbit/will-an-individualised-ransomware-c