Ian McKenzie comments on The bitter lesson of misuse detection

Ian McKenzie 11 Jul 2025 2:04 UTC
2 points
0
Our paper on defense in depth (STACK) found similar results – similarly-sized models with a few-shot prompt significantly outperformed the specialized guard models, even when adjusting for FPR on benign queries.