These are certainly issues that would arise if a feature like this was implemented poorly, but I think you’re overestimating the difficulty of implementing it well, given the current intelligence level of the underlying models. You would certainly want to filter aggressively for duplicates and false positives, but this is something that multiple AI vulnerability search projects are already doing successfully. Sending a proof of concept script could be problematic, but in practice I think recipients mostly don’t run proof of concept scripts, they just need the writeup and the filename/line numbers. Data exfiltration could theoretically happen but coding agents are typically already running in an environment where they could exfiltrate information if they wanted to, eg by sneaking code that makes a network request into the software they’re working on, and this hasn’t been that much of a problem in practice.
(LessWrong has received security vulnerability reports from AI security researchers recently; we haven’t published the writeup yet but will do so probably later today. They were not false positives. They did include proof of concept scripts, and the presence of proof of concept scripts served as a credible signal that the issues were real, but we didn’t run them and I expect it would be pretty rare for developers to run proof of concept scripts in that sort of context.)
These are certainly issues that would arise if a feature like this was implemented poorly, but I think you’re overestimating the difficulty of implementing it well, given the current intelligence level of the underlying models. You would certainly want to filter aggressively for duplicates and false positives, but this is something that multiple AI vulnerability search projects are already doing successfully. Sending a proof of concept script could be problematic, but in practice I think recipients mostly don’t run proof of concept scripts, they just need the writeup and the filename/line numbers. Data exfiltration could theoretically happen but coding agents are typically already running in an environment where they could exfiltrate information if they wanted to, eg by sneaking code that makes a network request into the software they’re working on, and this hasn’t been that much of a problem in practice.
(LessWrong has received security vulnerability reports from AI security researchers recently; we haven’t published the writeup yet but will do so probably later today. They were not false positives. They did include proof of concept scripts, and the presence of proof of concept scripts served as a credible signal that the issues were real, but we didn’t run them and I expect it would be pretty rare for developers to run proof of concept scripts in that sort of context.)