I googled the source link here: https://aisle.com/blog/ai-cybersecurity-after-mythos-the-jagged-frontier
I’m also concerned about isolating the code. It’s the difference between finding a needle in a haystack, and distinguishing a needle from a single straw. Their set of models returned 12⁄18 false positives (and 18⁄18 true positives), which suggests terrible specificity to me.
https://aisle.com/blog/ai-cybersecurity-after-mythos-the-jagged-frontier
They can detect the problem, but not develop a working exploit. They also had a 2⁄3 false positive rate on a the patched version of that function.
They say that’s fine because something other than the (frontier) model can do those steps, but don’t demonstrate the capability anywhere I could see.