Project Glasswing: Anthropic Shows The AI Train Isn’t Stopping
Note: This was initially written for a more general audience, but does contain information that I feel that even the average LW user might benefit from. Oh, and zero AI involvement in the writing, even if I could have been amused by getting Claude to do the work for me (and even if expect that it would have done a good job at it). If you want a better breakdown of the technical details, read the Model Card or wait for Zvi.
In AI/ML spaces where I hang around (mostly as a humble lurker), there have been rumors that the recent massive uptick in valid and useful submissions for critical bugfixes might be attributable to a frontier AI company.
I specify “valid” and “useful”, because most OSS projects have been inundated with a tide of low-effort, AI generated submissions. While these particular ones were usually not tagged as AI by the authors, they were accepted and acted-upon, which sets a rather high floor on their quality.
Then, after the recent Claude Code leak, hawk-eyed reviewers noted that Anthropic had internal flags that seemed to prevent AI agents disclosing their involvement (or nature) when making commits. Not a feature exposed to the general public, AFAIK, but reserved for internal use. This was a relatively minor talking point compared to the other juicy tidbits in the code.
Since Anthropic just couldn’t catch a break, an internal website was leaked, which revealed that they were working on their next frontier model, codenamed either Mythos or Capybara (both names were in internal use). This was… less than surprising. Everyone and their dog knows that the labs are working around the clock on new models and training runs. Or at least my pair do. What was worth noting was that Anthropic had, for the last few years, released 3 different tiers of model—Haiku, Sonnet and Opus, in increasing order of size and capability (and cost). But Mythos? It was presented as being plus ultra, too good to simply be considered the next iteration of Opus, or perhaps simply too expensive (Anthropic tried hard to explain that the price was worth it).
But back to the first point: why would a frontier company do this?
Speculation included:
A large breakthrough in cyber-security capabilities, particularly in offense (but also in defense) which meant a serious risk of users with access to the models quickly being able to automate the discovery and exploitation of long dormant vulnerabilities, even in legacy code with plenty of human scrutiny.
This would represent very bad press, similar to Anthropic’s headache after hackers recently used Claude against the Mexican government. It’s one thing to have your own tooling for vetted users or approved government use, it’s another for every random blackhat to use it in that manner. You cannot release it to the general public yet—the capability jump is large enough that the offensive applications are genuinely concerning before you have defensive infrastructure in place. But the vulnerabilities it’s finding exist right now, in production code running on critical systems worldwide. You cannot un-find them. And you have no particular reason to believe you are the only actor who will eventually find them.
Thus, if a company notices that their next model is a game-changer, it might be well worth their time to proactively fix bugs with said model. While the typical OSS maintainer is sick and tired of junk submissions, they’d be far more receptive when actual employees of the larger companies vouch for their AI-assisted or entirely autonomous work (and said companies have probably checked to make sure their claims hold true).
And, of course, street cred and goodwill. Something the companies do need, with increasing polarization on AI, including in their juiciest demographic: programmers.
I noted this, but didn’t bother writing it up because, well, they were rumors, and I’ve never claimed to be a professional programmer.
And now I present to you:
Project Glasswing by Anthropic
Today we’re announcing Project Glasswing1, a new initiative that brings together Amazon Web Services, Anthropic, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks in an effort to secure the world’s most critical software. We formed Project Glasswing because of capabilities we’ve observed in a new frontier model trained by Anthropic that we believe could reshape cybersecurity. Claude Mythos2 Preview is a general-purpose, unreleased frontier model that reveals a stark fact: AI models have reached a level of coding capability where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilities.
Mythos Preview has already found thousands of high-severity vulnerabilities, including some in every major operating system and web browser.* Given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors who are committed to deploying them safely. The fallout—for economies, public safety, and national security—could be severe. Project Glasswing is an urgent attempt to put these capabilities to work for defensive purposes.
..
Over the past few weeks, we have used Claude Mythos Preview to identify thousands of zero-day vulnerabilities (that is, flaws that were previously unknown to the software’s developers), many of them critical, in every major operating system and every major web browser, along with a range of other important pieces of software.
Examples given:
Mythos Preview found a 27-year-old vulnerability in OpenBSD—which has a reputation as one of the most security-hardened operating systems in the world and is used to run firewalls and other critical infrastructure. The vulnerability allowed an attacker to remotely crash any machine running the operating system just by connecting to it;
It also discovered a 16-year-old vulnerability in FFmpeg—which is used by innumerable pieces of software to encode and decode video—in a line of code that automated testing tools had hit five million times without ever catching the problem;
The model autonomously found and chained together several vulnerabilities in the Linux kernel—the software that runs most of the world’s servers—to allow an attacker to escalate from ordinary user access to complete control of the machine.
We have reported the above vulnerabilities to the maintainers of the relevant software, and they have all now been patched. For many other vulnerabilities, we are providing a cryptographic hash of the details today (see the Red Team blog), and we will reveal the specifics after a fix is in place.
Well. How about that. I wish the skeptics good luck, someone’s going to be eating their hat very soon, and it’s probably not going to be me. I’ll see you in the queue for the dole. Being right about these things doesn’t really get me out of the lurch either, Cassandra’s foresight brought about no happy endings for anyone involved. I am not that pessimistic about outcomes, in all honesty, but the train shows no signs of stopping.
Edit: A link to the Substack version of this post. I don’t think you should consider me an authoritative source when it comes to AI/ML, at best I’m the kind of nerd who reads the relevant papers with keen interest. But God knows the quality of discourse around the topic is so bad that you can do worse.
Outsiders like myself can do some things to take advantage of this program. Using software that is confirmed to get patches is the best option, but that can’t cover all use cases. Use Chromium to watch videos[1], listen to audio and read PDF/text/HTML documents, use Firefox to edit PDFs, use the latest Linux kernel from Greg Kroah-Hartman (not Linus’s tree) from kernel.org or the repos of e.g. Debian testing or Arch Linux. I don’t have a suggestion for reading `.epub` E-Books, except writing a Haskell program using pure functions from the pandoc project to convert to PDF, though this seems not to always work.
Note that you will need to keep all this software as up to date as possible, but this may make you more vulnerable to supply chain attacks. You will need to do this until a few weeks after the end of the Glasswing project. Be careful of how you source program updates, and don’t blindly update dependencies. Use upstream lock files if possible to get fixes to vulnerabilities not disclosed outside of Project Glasswing.
My most important comment here is on the nature of VMs running under Linux. The KVM hypervisor is part of the Linux kernel, and therefore is part of project Glasswing. What I’m not sure about is the surrounding userspace software that runs on the host and usually isn’t sandboxed (very well) like QEMU, libvirt, and swtpm. Note that I’m nearly certain that Mythos developed a privilege escalation that could go from a RCE in any of these projects to complete control over the host system, or at least root/write access to all filesystems. I would like a statement from one of the companies involved in project Glasswing that they have tested the host userspace programs around KVM, not just KVM itself. This is important because if the interior of the VM is compromised, it can communicate with virtual devices these software packages provide, e.g. virtual drives and security devices.
If there’s information this is getting worked on, then consider me suggesting that you should run programs that you don’t think are getting Project Glasswing support in a KVM/libvirt VM on the newest stable Linux kernel. Note that everything that comes out of these VMs needs to be considered contaminated, and must only be opened in e.g. Chromium or another known-Glasswing-patched program. You may need to E-Mail these files to other people however, and I don’t have a solution to that.
This seems to work now even for some `.mkv` files, but I don’t think this is general. You can try to convert them to `.webm` using FFMpeg in a VM, but note that all files that come out of the VM are considered contaminated, and therefore need to be played back in Chromium, not a standard media player that doesn’t feature Chromium’s strong (and Glasswing tested) sandbox. See later for VM security considerations.
I’m somewhat concerned about the possible problems that the recent increased load of patches may cause during the creation of the Linux 7.0.1 release. In theory it’s just a matter of checking the applicability of the entire set of patches to Linus’s tree, but given the situation I think the consequence of something getting missed is higher than normal[1].
I think an alternative solution of using the 6.19.XX series from Greg K-H until a few days after its last release is a better idea, but it’s close, ~0.35 that it ends up worse[2]. I think better automation is needed.
This may require building the kernel for yourself unless Greg K-H ends the series early, but until then here are the required file changes for Debian 13:
Config
/etc/apt/preferences.d/testingToChangePriority/etc/apt/preferences.d/testingKernelBackport/etc/apt/sources.list.d/debian.sourcesNote that I’ve tested this and it doesn’t seem to work correctly when your system is set up to build kernel modules from source to install into this new kernel, because it causes other dependencies to update to the testing version. Otherwise, I tested it to work on multiple systems.
Where normally the churn might be in an obscure driver where failures causes problems for few users.
E.g. what happened here? https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-6.6.y&id=5a1e865e51063d6c56f673ec8ad4b6604321b455
In the most annoying of all possible worlds, they held back some really nice bugs to sell to national militaries, and used their safeguards skills to ensure that users hunting for those bugs using the model are misled.
Possibly, neither of us are in a position to judge with certainty. But I doubt that Anthropic is feeling particularly helpful, given their recent falling out with the US government.
If you’re a company that wasn’t in on mythos, expect your stock to tank when it gets released. Building the tool and using it for the benefit of a self-selected elite is gross.