alexandracar comments on When Benchmarks Lie: Evaluating Malicious Prompt Classifiers Under True Distribution Shift