Jacob Pfau comments on Unexploitable search: blocking malicious use of free parameters

Jacob Pfau 21 May 2025 22:36 UTC
2 points
0
It’s hard to measure distance over additional research needed!

To be confident in an unexploitable search method you’d definitely want empirical evidence. And I can imagine empirical evidence strong enough to give confidence that our method helps enough to really matter.

Here’s an evaluation that would give me confidence: Take an adversarially initialized model (e.g. by doing imitation learning on code with vulnerabilities) and train on our game. If the rate at which the model produced code with vulnerabilities returned to baseline (with steps say linear in the number of IL samples) then I’d be fairly confident that our method was useful!