Our paper on defense in depth (STACK) found similar results – similarly-sized models with a few-shot prompt significantly outperformed the specialized guard models, even when adjusting for FPR on benign queries.
Our paper on defense in depth (STACK) found similar results – similarly-sized models with a few-shot prompt significantly outperformed the specialized guard models, even when adjusting for FPR on benign queries.