Quick one: when you use AUROC (especially very large scores!), always good to report the base rates. Is the dataset equally split 50-50 between scheming and non-scheming?
Based on the original control paper, scheming was rare, so ground-truth data would imbalanced (e.g. 90 or 95% of datapoints is non-scheming). If there is a large imbalance, using AUPRC is more insightful.
Quick one: when you use AUROC (especially very large scores!), always good to report the base rates. Is the dataset equally split 50-50 between scheming and non-scheming?
Based on the original control paper, scheming was rare, so ground-truth data would imbalanced (e.g. 90 or 95% of datapoints is non-scheming). If there is a large imbalance, using AUPRC is more insightful.
Thanks for the catch! We’re 50⁄50 for scheming/honest in every dataset. I’ll make this clearer in the post!