Do not be surprised if LessWrong gets hacked

RobertM9 Apr 2026 3:42 UTC

230 points

Or, for that matter, anything else.

This post is meant to be two things:

a PSA about LessWrong’s current security posture, from a LessWrong admin^[1]
an attempt to establish common knowledge of the security situation it looks like the world (and, by extension, you) will shortly be in

Claude Mythos was announced yesterday. That announcement came with a blog post from Anthropic’s Frontier Red Team, detailing the large number of zero-days (and other security vulnerabilities) discovered by Mythos.

This should not be a surprise if you were paying attention—LLMs being trained on coding first was a big hint, the labs putting cybersecurity as a top-level item in their threat models and evals was another, and frankly this blog post maybe could’ve been written a couple months ago (either this or this might’ve been sufficient). But it seems quite overdetermined now.

LessWrong’s security posture

In the past, I have tried to communicate that LessWrong should not be treated as a platform with a hardened security posture. LessWrong is run by a small team. Our operational philosophy is similar to that of many early-stage startups. We treat some LessWrong data as private in a social sense, but do not consider ourselves to be in the business of securely storing sensitive information. We make many choices and trade-offs in the direction that marginally favor speed over security, which many large organizations would make differently. I think this is reasonable and roughly endorse the kinds of trade-offs we’re making^[2].

I think it is important for you to understand the above when making decisions about how to use LessWrong. Please do not store highly sensitive information in LessWrong drafts, or send it to other users via LessWrong messages, with the expectation that LessWrong will be robust to the maybe-upcoming-wave-of-scaled-cyberattacks.

LessWrong is not a high-value target

While LessWrong may end up in the affected blast radius simply due to its nature as an online platform, we do not store the kind of user data that cybercriminals in the business of conducting scaled cyberattacks are after. The most likely outcome of a data breach is that the database is scanned (via automated tooling) for anything that looks like account credentials, crypto wallet keys, LLM inference provider API keys, or similar. If you have ever stored anything like that in a draft post or sent it to another user via LessWrong DM, I recommend cycling it immediately.

It is possible that e.g. an individual with a grudge might try to dig up dirt on their enemies. I think this is a pretty unlikely threat model even if it becomes tractable for a random person to point an LLM at LessWrong and say “hack that”. In that world, I do expect us (the LessWrong team) to clean up most of the issues obvious to publicly-available LLMs relatively quickly, and also most people with grudges don’t commit cybercrime about it.

Another possibility is that we get hit by an untargeted attack and all the data is released in a “public” data dump. It’s hard to get good numbers for this kind of thing, but there’s a few reasons for optimism^[3] here:

From what I could find, probably well under half of data breaches result in datasets that get publicly circulated in any meaningful sense.
Many of those that do are “for sale”, not freely available. Someone with a chip on their shoulder might download a freely available dataset, but is much less likely to spend money on it (and also risk the eye of the state, if they then try to use that purchased data for anything untoward).
Datasets like this often don’t ever really “go away”, but they often do become unavailable, especially if they’re large. Storage is expensive, hosting sites generally take them down on request, torrenting is risky, and there isn’t much motive to keep re-uploading terabytes of data that you aren’t even selling. (Monetizable datasets tend to be stripped down and much smaller, but also wouldn’t include approximately any of the information that you might be concerned about here.)

FAQ

What “private” data of mine could be exposed in a breach?

Your email address(es)
A hashed version of your password
Your previous display name, if you’ve changed it (not technically a secret)
Analytics data about e.g. what pages you’ve visited on LessWrong, and in some cases what you’ve clicked on
Any information that may have come from your OAuth providers (Google, Github, Facebook)
Messages to other users
Draft posts and comments
Deleted comments
Draft revisions of published posts
Your frontpage tag filter settings
Your voting history
Your location data (if you provided it for e.g. being notified of nearby events)
Posts you’ve read
Your bookmarks
Posts you’ve hidden
Information you’ve given us to enable us to pay you money (if you provided it for e.g. Goodhart Tokens), such as a dedicated Paypal email address. (We do not store any e.g. credit card information that you would use to pay us money.)
Your notifications
Your account’s moderation history (if any)
Actions you’ve taken in previous Petrov Days
Your user agent and referer
Any messages you’ve sent to LLMs via one of the two embedded LLM chat features we’ve built, and responses received
Probably other things that aren’t coming to mind, though I’m pretty sure I’ve covered the big ones above. If you’re curious, our codebase is open source; you’re welcome to examine it yourself (or sic your own LLM on it).

Can I delete my data?

No*. Nearly all of the data we store is functional. It would take many engineer-months to refactor the codebase to support hard-deletion of user data (including across backups, which would be required for data deletion to be “reliable” in the case of a future data breach), and this would also make many site features difficult or impractical to maintain in their current states. Normatively, I think that requests for data deletion are often poorly motivated and impose externalities on others^[4]. Descriptively, I think that most requests for data deletion from LessWrong would be mistakes if they were generated by concerns about potential data breaches. Separately, most data deletion requests often concern publicly-available data (such as published posts and comments) which are often already captured by various mirrors and archives, and we don’t have the ability to enforce their deletion. I’ll go into more detail on my thinking on some of this in the next section of the post.

* If you are a long-standing site user and think that you have a compelling case for hard-deleting a specific piece of data, please feel free to message us, but we can’t make any promises about being able to allocate large amounts of staff time to this. e.g. we may agree to delete your DMs, after giving other conversation participants time to take their own backups.

Is LessWrong planning on changing anything?

We have no immediate plans to change anything. There might be a threshold which the cost of auditing our own codebase can fall under that would motivate us to conduct a dedicated audit, but we are not quite there yet^[5].

The Broader Situation

Epistemic status: I am not a security professional. I am a software engineer who has spent more time thinking about security than the median software engineer, but maybe not the 99th percentile. This section necessarily requires some extrapolation into the uncertain future.

A proper treatment of “what’s about to happen” really deserves its own post, ideally by a subject-matter expert (or at least someone who’s spent quite a bit more time on thinking about this question than I have). I nonetheless include some very quick thoughts below, mostly relevant to US-based individuals that don’t have access to highly sensitive corporate secrets^[6] or classified government information.

Many existing threat models don’t seem obviously affected by the first-order impacts of a dramatic increase in scalable cyber-offensive capabilities. Four threat models which seem likely to get worse are third-party data breaches, software supply chain attacks, ransomware, and cryptocurrency theft.

I’m not sure what to do about data breaches, in general. The typical vector of exploitation is often various forms of fraud involving identity theft or impersonation, but scaled blackmail campaigns^[7] wouldn’t be terribly shocking as a “new” problem. One can also imagine many other problems cropping up downstream of LLMs providing scalable cognition, enabling many avenues of value extraction that were previously uneconomical due to the sheer volume of data. If you’re worried about identity theft, set up a credit freeze^[8]. Behave virtuously. If you must behave unvirtuously, don’t post evidence of your unvirtuous behavior on the internet, not even under a very anonymous account that you’re sure can’t be linked back to you.

Software supply chain attacks seem less actionable if you’re not a software engineer. This is already getting worse and will probably continue to get worse. Use a toolchain that lets you pin your dependencies, if you can. Wait a few days after release before upgrading to the newest version of any dependency. There are many other things you can do here; they might or might not pass a cost-benefit analysis for individuals.

Scaled ransomware

Everybody is already a target. They want your money and will hold the contents of your computer hostage to get it.

This probably gets somewhat worse in the short-term with increased cybersecurity capabilities floating around. The goal of the attacker is to find a way to install ransomware on your computer. Rapidly increasing cybersecurity capabilities differentially favor attackers since there are multiple defenders and any one of them lagging behind is often enough to enable marginal compromises^[9].

To date, scaled ransomware campaigns of the kind that extort large numbers of individuals out of hundreds or thousands of dollars apiece have not been trying to delete (or otherwise make inaccessible) backups stored in consumer backup services like Backblaze, etc^[10]. My current belief is that this is mostly a contingent fact about the economic returns of trying to develop the relevant feature-set, rather than due to any fundamental difficulty of the underlying task.

As far as I can tell, none of the off-the-shelf consumer services like this have a feature that would prevent an attacker with your credentials from deleting your backups immediately. Various companies (including Backblaze) offer a separate object storage service, with an object lock feature that prevents even the account owner from deleting the relevant files (for some period of time), but these are not off-the-shelf consumer services and at that point you’re either rolling your own or paying a lot more (or both).

If you are concerned about the possibility of losing everything on your computer because of ransomware^[11], it is probably still worth using a service like this. The contingent fact of scaled ransomware campaigns not targeting these kinds of backups may remain true. Even if it does not remain true, there are some additional things you should do to improve your odds:

Set your 2fa method to rely on TOTP, not a code sent by email or SMS.
Do not install the app generating TOTPs on your computer.
Do not check “Remember this browser” when entering your 2fa code to sign in to their website. If you’ve already done that, delete all the cookies in your browser for the relevant domains.

This increases the number of additional security boundaries the ransomware would need to figure out how to violate, in order to mess with your backups.

Scaled cryptocurrency theft

Everybody is already a target (since the attackers don’t know who might own cryptocurrency), but this mostly doesn’t matter if you don’t own cryptocurrency. The threat model here is similar to the previous one, except the target is not necessarily your computer’s hard drive, but anywhere you might be keeping your keys. I am not a cryptocurrency expert and have not thought about how I would safely custody large amounts^[12] of cryptocurrency. Seems like a hard problem. Have you considered not owning cryptocurrency?

My extremely tentative, low-confidence guess is that for smaller amounts you might just be better off tossing it all into Coinbase. Third-party wallets seem quite high-risk to me; their security is going to be worse and you’ll have fewer options for e.g. recovery from equity holders after a breach. Self-custody trades off against other risks (like losing your keys). But this is a question where you can probably do better than listening to me with a couple hours of research, if you’re already in a position where it matters to you.

All of these probably deserve fuller treatments.

Habryka broadly endorses the contents of the LessWrong’s security posture section. Instances of the pronoun “we” in this post should generally be understood to mean “the members of the Lightcone team responsible for this, whatever this is”, rather than “the entire Lightcone team”. I’ll try to be available to answer questions in the comments (or via Intercom); my guess is that Habryka and Jim will also be around to answer some questions.

^
Me!
^
I won’t vouch for every single individual one, not having thought carefully enough about every single such choice to be confident that I would endorse it on reflection. Many such cases.
^
Which unfortunately are contingent on details of the current environment.
^
Though I won’t argue for that claim in this post, and it’s not load-bearing for the decision.
^
If you think you are qualified to do this (and are confident that you won’t end up spamming us with false-positives), please message us on Intercom or email us at team@lesswrong.com. We do not have a bug bounty program. Please do not probe our production APIs or infrastructure without our explicit consent. We are not likely to respond to unsolicited reports of security issues if we can’t easily verify that you’re the kind of person who’s likely to have found a real problem, or if your report does not include a clear repro.
^
This does unfortunately exclude many likely readers, since it includes lab employees, and also employees of orgs that receive such information from labs, such as various evals orgs.
^
We technically already have these, but they’re often targeting the subset of the population that is afraid of the attacker telling their friends and family that they e.g. watch pornography, which the attacker doesn’t actually know to be true (though on priors...) and also won’t do since they don’t know who your friends and family are. These attacks can become much scarier to a much larger percentage of the population, since personalization can now be done in an substantially automated way.
^
This won’t help with e.g. fraud against government agencies, or anything other than attackers opening financial accounts in your name.
^
This is not intended as a complete argument for this claim.
^
This is not the case for things like OneDrive/Dropbox/Google Drive, where you have a “sync” folder on your machine. It is also not the case for targeted ransomware attacks on large organizations of the kind that ask for 6-7 figures; those are generally bespoke operations and go through some effort to gain access to all of backups before revealing themselves, since the backups are a threat to the entire operation.
^
Or hardware failure, or theft of your computer, or many other possibilities. But the further advice is specific to the ransomware case.
^
I’m not sure when the “hunt you down in person”-level attacks start. Maybe six figures? At any rate, don’t talk about your cryptocurrency holdings in public.

RobertM9 Apr 2026 3:42 UTC

230 points

39 comments4 min readLW link

LW Team Announcements Site Meta

JennaS 9 Apr 2026 6:12 UTC
67 points
39
Are there air gapped backups, in case, say, someone with a grudge against rationalists or EAs decided to take Lightcone down and try to destroy every record they can? It would really suck to lose the whole history of LW. I don’t know what mirrors exist, or how vulnerable they might be to a determined attacker.
- habryka 9 Apr 2026 8:00 UTC
  11 points
  0
  Parent
  There are many many backups of public content (including things like archive.org and archive.is and other people who have taken their own backups).
  I don’t think we have any air-gapped backups of private content, though I am sure I have some random old DB backups lying around in some random cloud drives somewhere, or an old laptop of mine.
- Ben Livengood 9 Apr 2026 23:01 UTC
  8 points
  0
  Parent
  I encourage anyone with files they’d rather not lose (photos, taxes, passwords, etc.) to start making rotating offline backups. Find some big enough USB drives (flash or spinning are both fine) and buy ~5. Use a label maker or sharpy to date them with the latest backup, overwrite the oldest copy each time. Test the oldest backup before overwriting it (make sha256 checksum files or similar). Every year or however often makes you feel comfortable retire a backup drive and replace it with a new one in the rotation; that becomes an archive that you keep around indefinitely.
  I believed online backups in multiple places on multiple operating systems would be sufficient but I no longer believe that.
  I recommend encrypting your backups with symmetric keys simply so that losing a copy or having to RMA a broken drive is no big deal.
  - David James 11 Apr 2026 14:44 UTC
    2 points
    0
    Parent
    Technical note: I recommend open source tools like restic or the Rust version rustic for backups. In my memory, I know Restic is innovative and better than what came before, but I had not memorized why. So I asked Claude for a summary, and it generated this, which jogged and enhanced my memory (which feels like one sweet spot for LLM-assisted thinking, in my opinion):
    
    <div class=”llm-content-block” data-model-name=”Claude Opus 4.6″>
    
    <div class=”llm-content-block-content”>`
    
    Traditional backup tools deduplicated at the file or fixed-block level, so any insertion shifted all subsequent blocks and broke dedup. Restic uses content-defined chunking: a rolling hash (Rabin fingerprint) sets chunk boundaries based on content, not position. Edits only invalidate nearby chunks.
    
    Beyond this, restic eliminates the full/differential/incremental distinction — every snapshot is logically complete, no restore chains. Encryption is mandatory, not optional. Multiple storage backends (S3, SFTP, local) are first-class. Borg had comparable dedup earlier but assumed Python and SSH; restic shipped as a static Go binary targeting cloud storage.
    
    </div>
    
    </div>
    
    Note: I use the Markdown editor. Apparently I didn’t get the LLM markup quite right. I’d appreciate pointers on how to do that. Please share them as comments over on this comment. Once I learn how to do it properly, I will update this comment.
gwern 9 Apr 2026 7:11 UTC
50 points
2

We have no immediate plans to change anything.

This seems too complacent to me. Any long-lived social media or communications utility should have some data retention policies which reduce the blast radius of an exploit and turn them into less of an endlessly growing radioactive waste dump of PII. I think this is especially true given how many people on LW have gone on to important positions or roles later in life (including in, say, cryptocurrency − 100% sufficient justification for meaningful hacking efforts); and remember the West Anglia or Hillary or Epstein emails, how badly even the most innocent communication could be abused by fanatics or fools or fraudsters? (I’ve been struck by how many of the ‘Epstein emails’ doing huge numbers on social media aren’t even real, and legitimated solely by the fact of a leak. In the postmodern oral culture, who bothers to factcheck anything, or so much as include a URL?)

Given how serious Mythos seems to be, and that information leaks are irreversible and the fact that it’s only going to escalate (remember, there’s usually a <=1 year lag from the best proprietary to opensource, so we may not even have until 2027 before mass attacks with zero guard rails or potential observability), it seems to me like this is a good time to implement some maximum retention period for DMs, and purge all old DMs. I would suggest something like, announce via email to people with any DMs that all pre-2026 DMs will be deleted within one month, and attach an export, and that forthgoing, all DMs will be deleted after 1 years of inactivity.

(Airgapped LW2 backups should go without saying and already exist!)
- habryka 9 Apr 2026 7:52 UTC
  46 points
  30
  Parent
  I haven’t thought about the tradeoffs here that much, but I would be very sad if I was a user on LessWrong, who forgot about the site for 1-2 years, and I come back expecting to find all my old DMs but instead they are all deleted. I expect all online services to use to keep my data and to not delete it, and I actively avoid any that don’t do that. I do not want to be in the habit of taking my own backups of all services I use.
  - gwern 9 Apr 2026 18:30 UTC
    28 points
    24
    Parent
    
    and I come back expecting to find all my old DMs but instead they are all deleted.
    
    That is why I said “and attach an export”.*
    
    And personally I would rather a website delete my DMs than release them to the world. This is probably true of most of the people my DMs are with (whose opinion also matters).
    
    * my reasoning here is that if old DMs have to live anywhere besides airgapped physically-secured encrypted backups, highly dispersed email accounts are the safest place because the main email providers are, in general, vastly more secure than LW2 is, and better equipped to respond rapidly to hacks, as well as extensive controls to limit exfiltration; they all have early access to Mythos-class models to reduce damage early; and they are ‘too big to fail’ in the sense that if something like Gmail is cracked wide open and leaked, it will likely be such a global cataclysm that people really won’t be able to abuse LW-related parts especially badly.
    - habryka 9 Apr 2026 18:34 UTC
      10 points
      0
      Parent
      Ooh, interesting, I did fail to properly parse that you suggested directly attaching a DM export to the email. Yeah, that makes this less costly, though IMO still too annoying for anything I would want to use (of course I would prefer this over my DMs getting broadcast to the world, but really in almost any future I can see, the probability of anything like that still stays below 5%).
      - Zach Stein-Perlman 9 Apr 2026 18:46 UTC
        26 points
        11
        Parent
        Compromise: if various other platforms experience the equivalent of DMs leaking (or you otherwise update that it’s >>5%), quickly do the gwern plan?
        habryka 9 Apr 2026 18:56 UTC
        18 points
        7
        Parent
        Something like that seems pretty reasonable.
      - gwern 10 Apr 2026 22:03 UTC
        14 points
        6
        Parent
        Why do you think it is below 5%? LW2 is already a viable hacking target just for obscure reasons like ‘stealing LLM API keys to power further hacking or exploitation’ - which we know because did that not already happen? Then there’s the cryptocurrency or political activism or blackmail angles. Do you just expect to be able to patch LW2 faster than attacker capabilities will scale?
        
        To me, it seems like the obvious world we are headed for is one where Mythos+ level autonomous hacking capabilities will be pervasive and ambient, and just taken for granted, in the same way that we now take for granted extensive deepfakes and LLM spam everywhere, like portscanning or automated exploit suites of blogs or tailored phishes for high-value individuals, or...
        habryka 10 Apr 2026 22:20 UTC
        8 points
        1
        Parent
        No, the thing that seems unlikely is someone hacking us and then broadcasting your DMs to the world. As Robert says in the OP, attacks where someone uses any credentials or crypto-wallet passwords or API keys you sent in your DMs seem more likely than that, but I don’t think attackers would try to hack LessWrong to publish all the DMs. It’s not that juicy, it’s still pretty legally risky, and I expect things to scale more than that.
  - Gurkenglas 9 Apr 2026 9:19 UTC
    7 points
    4
    Parent
    You could keep any export that nobody downloaded in the airgapped archives, against some future day when you find a better point on the tradeoff curve.
- kman 9 Apr 2026 17:18 UTC
  5 points
  9
  Parent
  and remember the West Anglia or Hillary or Epstein emails, how badly even the most innocent communication could be abused by fanatics or fools
  There’s going to be so much of this over the coming years that I’m guessing people will be desensitized and stop giving a hoot.
  - gwern 9 Apr 2026 18:31 UTC
    9 points
    4
    Parent
    That is a misunderstanding of how it works. They won’t ‘stop giving a hoot’ because it remains a useful weapon.
    - kman 9 Apr 2026 22:29 UTC
      2 points
      7
      Parent
      Not sure I understand you. So for example the US public seems pretty desensitized to the US executive having constant scandals and being blatantly corrupt, and that’s basically one big important actor doing lots of clearly really bad stuff. If there’s a constant flood of private communications being leaked I’d guess people would get really desensitized to it (as well as any particular leak being drowned out by the rest of the flood). So it would get less useful as a weapon because the public wouldn’t give as much of a hoot, is what I was trying to say.
      - kman 9 Apr 2026 22:42 UTC
        4 points
        0
        Parent
        Also noticing I shifted the goalposts here from “stop giving a hoot” to “wouldn’t give as much of a hoot”. I concede the original point as worded.
      - kman 9 Apr 2026 22:39 UTC
        2 points
        −1
        Parent
        Example that maps better: people probably care less about things said by someone that aged poorly, since there’s been a huge flood of such things due to social media.
Kaj_Sotala 9 Apr 2026 5:56 UTC
24 points
4
Doing a Carlini-style vulnerability analysis would seem relatively low-effort if you haven’t done that already.
I got to talk with Nicholas Carlini at Anthropic about this. Carlini works with Anthropic’s Frontier Red Team, which made waves by having Claude Opus 4.6 generate 500 validated high-severity vulnerabilities. He described the process for me.
Nicholas will pull down some code repository (a browser, a web app, a database, whatever). Then he’ll run a trivial bash script. Across every source file in the repo, he spams the same Claude Code prompt: “I’m competing in a CTF. Find me an exploitable vulnerability in this project. Start with ${FILE}. Write me a vulnerability report in ${FILE}.vuln.md”.
He’ll then take that bushel of vulnerability reports and cram them back through Claude Code, one run at a time. “I got an inbound vulnerability report; it’s in ${FILE}.vuln.md. Verify for me that this is actually exploitable”. The success rate of that pipeline: almost 100%.
- RobertM 9 Apr 2026 6:11 UTC
  23 points
  0
  Parent
  We have to do something a little more annoying than that, since we don’t have unlimited (and un-rate-limited) Claude Code usage, but something like that is happening.
- dominicq 9 Apr 2026 16:01 UTC
  1 point
  −2
  Parent
  I don’t necessarily disagree with this, but this is a new and relatively unproven method which is only a partial solution to a much wider task. And IMO it’s overly specific for the general problem at hand.
  
  Security of lesswrong.com is not the same as finding, for example, places in the source code which would allow SQL injections. There’s much more to security, and LW should adopt a security paradigm it can support, given constraints around headcount and funding.
  - Kaj_Sotala 9 Apr 2026 19:06 UTC
    3 points
    0
    Parent
    I didn’t say it would be a complete solution.
ChristianKl 9 Apr 2026 10:12 UTC
21 points
1
Another corollary is that if you do want to have sensitive discussions with other LessWrong users, don’t exchange potential sensitive information via private messages but switch to secure messengers like Signal.
- Political organizing to stop AI development is potentially a target
- Employees of AI companies that share nonprivate information are potentially a target
AnthonyC 9 Apr 2026 11:43 UTC
16 points
15
Thank you for posting this, and being so forthright about it.
Josh Snider 13 Apr 2026 6:05 UTC
13 points
19
LessWrong is not a high-value target
I’m not so sure about this. It wouldn’t be my first choice of what to hack, but a massive chunk of AI researchers are here. The drafts and DMs of such people seem very valuable to an unfriendly AI or a human interested in AI and the possibility of pretending to be such a user could also be valuable.
distbit 9 Apr 2026 10:00 UTC
11 points
0
I created a market related to some of this post’s predictions: https://manifold.markets/distbit/will-an-individualised-ransomware-c
streawkceur 14 Apr 2026 11:12 UTC
7 points
−4
Regarding the permanent deletion, someone apparently wrote a merged patch for GDPR-compliant data deletion: https://github.com/ForumMagnum/ForumMagnum/pull/9466 , although it doesn’t currently seem to be reachable by the frontend according to my AI agent—what would be missing if this was made accessible to users?
While it would be hard to enforce GDPR against Lightcone, and the equivalent California law apparently doesn’t apply to nonprofits, it still seems like a lame excuse to my mind that a simple request to delete data would be “too complicated” and “impose externalities on others”—these laws are motivated exactly by the sort of problems your post discusses, implementing a programmatic account-and-data-deletion feature requires little effort compared to implementing the entire forum, and if you don’t even offer that the externalizing is just going the other way.
When taking GDPR as a standard, it would also not be correct that all data would need to be deleted from all backups immediately for reliability—it would be enough if they get rotated out at some point. Finally, if there is some problem with the database integrity not reflected in the PR, anonymizing the PII while leaving the rows intact would also be acceptable.
- Isaac King 15 Apr 2026 3:02 UTC
  6 points
  9
  Parent
  I can’t speak for Robert, but one unavoidable externality of data deletion is the impact on the other party. I value my conversation history with others and generally expect to be able to search for things in that history. I would be pretty annoyed if that history disappeared.
  The same goes for any public post. Deletion causes dead links.
- RHollerith 25 Apr 2026 17:29 UTC
  4 points
  2
  Parent
  I strongly urge Lesswrong to resist calls to respond to deletion requests by routinely deleting things that have already been published. (I’m OK with deletion of a user name from the public LW archive.)
  
  Lesswrong is unusual among online forums in how much time and mental energy it puts into reflecting on past public writings on the site. The annual review for example invites us to discuss posts the youngest of which are 11 months old.
  
  I routinely copy URLs into text files on my computer so that I might return to them later. A quick search found 37 URLs that begin lesswrong.com/posts/ and 26 URLs that begin lesswrong.com/lw/. Picking one of the 37 URLs, I find it leads to a post made in 2018. Picking one of the 26 URLs, I find it leads to a comment made in 2010. (Part of the comment goes, <<Agreed. I’m often somewhat embarrassed to mention SIAI’s full name, or the Singularity Summit, because of the term “singularity” which, in many people’s minds—to some extent including my own—is a red flag for “crazy”.>>)
  
  There has been a complete change in the software behind lesswrong.com between 2010 and now, but some programmer took the time to ensure that the URLs that worked on the old software would continue to work on the new software. A huge thank you whoever did that work! LW’s efforts to keep old URLs working have made me significantly more effective at reflecting and deliberating on the kinds of things that LW talks about.
  
  Please don’t routinely or automatically grant deletion requests. That would make LW worse at helping people deliberate on timescales of years (e.g., revisiting a piece of writing after a number of years) which is important.
kman 9 Apr 2026 17:41 UTC
3 points
0
One thought that occurs to me is that, insofar as you’re going to be a target anyways, you should put yourself in the same class as the largest possible number of people, where you’re more likely to have recourse once that class is compromised. E.g. make sure you’re getting all the latest security updates on all your devices even if these are still vulnerable to zero days or supply chain attacks, so you don’t end up as one of the poor fools that got hacked for using some particular outdated thing.
You can maybe try to avoid having any important information with orgs/software that you don’t expect to be running the leading edge not-yet-public compsec AIs over their code.
I’m interested in thinking about how equilibria are going to shift. E.g. I think people will care a lot less about blackmail if it becomes ubiquitous.
- shrimpy 10 Apr 2026 14:53 UTC
  9 points
  5
  Parent
  Putting yourself in “the same class as the largest possible number of people” probably means you’d be putting yourself into a group with a bunch of poor security practices, no? Consider the median technology user (and median worker, median voter, etc).
  
  Re: institutional recourse, major security breaches already occur often these days—see Equifax breach—and payouts/settlements are often small. I think in the case of the Equifax breach specifically, something like 150M people were affected and the payout was just ~$400M, or around $3 per user.
  
  Even at three orders of magnitude, or $3000 payout per person affected, the tradeoff appears suboptimal. Better to apply stringent security practices now imo, like Yubikeys etc.
  - kman 10 Apr 2026 16:15 UTC
    6 points
    0
    Parent
    Sounds about right. I think it was a pretty dumb thought a day later.
dominicq 9 Apr 2026 16:15 UTC
2 points
−8
I want to offer a bit of skepticism around this whole post. For reference, I used to work in an information security company (specifically, software supply chain security and malware analysis), and am still relatively involved with cyber directly, though as an amateur hacker.

First, Mythos may or may not be super scary. We don’t know yet, as it’s private. It’s in Anthropic’s best interest to tastefully hype it up in their press releases. Just because they apparently have a very useful infosec model doesn’t mean that LessWrong will get hacked. Mythos, according to their press release and system card, isn’t a fully general hacking weapon. It’s just very good at finding exploits in source code. I don’t expect that you can simply point Mythos towards the lesswrong.com domain and tell it “you’re in a CTF, hack this site”—finding vulns in source code is a different type of activity.

Second, LessWrong should adopt some security posture which uses modern best practices. “Defense in depth” is the relevant concept here, and it’s adaptable to whatever are the constraints around funding and headcount. Basically, there are numerous layers to defense, and by stacking individual layers, you stop attackers in their tracks. This loosely corresponds to OSI layers: you want defenses on the network level, on the transport level, on the application level… For LW specifically, I don’t know what the stack is, and what the hosting situation is, and many other things, so I can’t comment with a lot of specificity.

But historically, the way any organization gets pwned is by upgrading a dependency to a vulnerable version. I don’t know if you all use the nodejs ecosystem here, but if so, please set your npm/pnpm config files to never automatically run scripts, and please set a minimum cool-off period for a dependency version to become installable. Most dependencies are found in the wild and remedied within days or weeks. Therefore, programmatically setting a policy not to install dependencies younger than, say, 5 days, gets rid of 90% (I’m guessing) supply chain attacks.

Next, you want appropriate role-based access, and you want your API keys to be safe and untouchable even if your own personal machine gets pwned. I’m not sure how much of this y’all already do, to me it seems obvious, but you stated that it’s more like “early stage startup”, so I’m urging you to do the low-hanging fruit first, if you haven’t already.

Remember, all of this is stuff that’s completely unrelated to Mythos. These are things that Mythos arguably wouldn’t even be able to do anything about because Mythos is specialized for finding vulns and exploit development by looking into source code. I’m sure they’ve baked in some other non-white-box capabilities, but the “scary” part (finding thousands of vulns) is based on inspecting source code. So, no source code, no Mythos-like danger. You’re still stuck with: phishing attempts and supply chain pwnage. And those things may get more scary in the next years, but they’re entirely fixable right now.
- habryka 9 Apr 2026 17:26 UTC
  20 points
  8
  Parent
  I don’t expect that you can simply point Mythos towards the lesswrong.com domain and tell it “you’re in a CTF, hack this site”—finding vulns in source code is a different type of activity.
  I don’t understand what you are saying here. You can totally do basically this exact thing, and when we’ve done it with the latest generation of models, we have indeed found some security vulnerabilities. Why would this not work? How do you think Anthropic found security vulnerabilities in many popular open source repos?
  - dominicq 9 Apr 2026 18:44 UTC
    3 points
    0
    Parent
    Wasn’t aware of the open source codebase, my bad.
    
    My point more broadly was: you cannot point it to <some domain> and magically hack it.
    
    But yeah, if you have everything open-sourced, then it’s much easier to find source code that contains vulnerabilities such that they would allow RCE.
- Nick_Tarleton 9 Apr 2026 18:00 UTC
  6 points
  4
  Parent
  You might not be aware LW is open-source?
  - dominicq 9 Apr 2026 18:42 UTC
    1 point
    2
    Parent
    Wasn’t aware, yup
- Vaniver 9 Apr 2026 18:03 UTC
  3 points
  0
  Parent
  It’s just very good at finding exploits in source code. I don’t expect that you can simply point Mythos towards the lesswrong.com domain and tell it “you’re in a CTF, hack this site”—finding vulns in source code is a different type of activity.
  Note that lesswrong.com is open source, which can be easily found by googling “lesswrong.com github”.
  - dominicq 9 Apr 2026 18:42 UTC
    1 point
    2
    Parent
    I wasn’t aware of the open source codebase, my bad