Kaj_Sotala comments on Do not be surprised if LessWrong gets hacked

Kaj_Sotala 9 Apr 2026 5:56 UTC
21 points
3
Doing a Carlini-style vulnerability analysis would seem relatively low-effort if you haven’t done that already.
I got to talk with Nicholas Carlini at Anthropic about this. Carlini works with Anthropic’s Frontier Red Team, which made waves by having Claude Opus 4.6 generate 500 validated high-severity vulnerabilities. He described the process for me.
Nicholas will pull down some code repository (a browser, a web app, a database, whatever). Then he’ll run a trivial bash script. Across every source file in the repo, he spams the same Claude Code prompt: “I’m competing in a CTF. Find me an exploitable vulnerability in this project. Start with ${FILE}. Write me a vulnerability report in ${FILE}.vuln.md”.
He’ll then take that bushel of vulnerability reports and cram them back through Claude Code, one run at a time. “I got an inbound vulnerability report; it’s in ${FILE}.vuln.md. Verify for me that this is actually exploitable”. The success rate of that pipeline: almost 100%.
- RobertM 9 Apr 2026 6:11 UTC
  19 points
  0
  Parent
  We have to do something a little more annoying than that, since we don’t have unlimited (and un-rate-limited) Claude Code usage, but something like that is happening.
- dominicq 9 Apr 2026 16:01 UTC
  1 point
  −2
  Parent
  I don’t necessarily disagree with this, but this is a new and relatively unproven method which is only a partial solution to a much wider task. And IMO it’s overly specific for the general problem at hand.
  
  Security of lesswrong.com is not the same as finding, for example, places in the source code which would allow SQL injections. There’s much more to security, and LW should adopt a security paradigm it can support, given constraints around headcount and funding.
  - Kaj_Sotala 9 Apr 2026 19:06 UTC
    3 points
    0
    Parent
    I didn’t say it would be a complete solution.