I don’t expect that you can simply point Mythos towards the lesswrong.com domain and tell it “you’re in a CTF, hack this site”—finding vulns in source code is a different type of activity.
I don’t understand what you are saying here. You can totally do basically this exact thing, and when we’ve done it with the latest generation of models, we have indeed found some security vulnerabilities. Why would this not work? How do you think Anthropic found security vulnerabilities in many popular open source repos?
I don’t understand what you are saying here. You can totally do basically this exact thing, and when we’ve done it with the latest generation of models, we have indeed found some security vulnerabilities. Why would this not work? How do you think Anthropic found security vulnerabilities in many popular open source repos?
Wasn’t aware of the open source codebase, my bad.
My point more broadly was: you cannot point it to <some domain> and magically hack it.
But yeah, if you have everything open-sourced, then it’s much easier to find source code that contains vulnerabilities such that they would allow RCE.