TheManxLoiner comments on Building Black-box Scheming Monitors

TheManxLoiner 31 Jul 2025 10:40 UTC
1 point
0
Quick one: when you use AUROC (especially very large scores!), always good to report the base rates. Is the dataset equally split 50-50 between scheming and non-scheming?

Based on the original control paper, scheming was rare, so ground-truth data would imbalanced (e.g. 90 or 95% of datapoints is non-scheming). If there is a large imbalance, using AUPRC is more insightful.
- richbc 31 Jul 2025 12:17 UTC
  1 point
  0
  Parent
  Thanks for the catch! We’re ⁵⁰⁄₅₀ for scheming/honest in every dataset. I’ll make this clearer in the post!