Small models also found the vulnerabilities that Mythos found

We took the specific vulnerabilities Anthropic showcases in their announcement, isolated the relevant code, and ran them through small, cheap, open-weights models. Those models recovered much of the same analysis. Eight out of eight models detected Mythos’s flagship FreeBSD exploit, including one with only 3.6 billion active parameters costing $0.11 per million tokens. A 5.1B-active open model recovered the core chain of the 27-year-old OpenBSD bug.

I’ve been more skeptical than the average reader/​commenter here around the capabilities of Mythos et al., and I also have some limited security experience.

It seems to me more surprising that human researchers didn’t discover these exploits, rather than that Mythos/​Opus did.

Also, vulnerability research is a very wide field, and as they say, “there are levels to this game”.

Overall, I think that Mythos is probably more capable at cybersecurity, but I don’t share the vibe that it’s a “god in a box”, or other such monikers I’ve seen online.