ryan_greenblatt comments on Unexploitable search: blocking malicious use of free parameters

ryan_greenblatt 22 May 2025 18:47 UTC
LW: 3 AF: 3
0
AF
I’m not saying it’s impossible for research sabotage to not be a sandbagging issue, just that the exact issue of “We ask an AI for open-ended advice on experimental prioritization. If the AI is deceptive, it might provide locally sound advice while also systematically deciding not to mention things we’ve overlooked in our experiments.” is well described as a sandbagging.