I agree with all of this! I should have been more exact with my comment here (and to be clear, I don’t think my critique applies at all to Jan’s paper).
One thing I will add: In the case where EM is being proved with a single question, this should be documented. One concern I have with the model organisms of EM paper, is that some of these models are more narrowly misaligned (like your “gender roles” example) but the paper only reports aggregate rates. Some readers will assume that if models are labeled as 10% EM, they are more broadly misaligned than this.
I agree with all of this! I should have been more exact with my comment here (and to be clear, I don’t think my critique applies at all to Jan’s paper).
One thing I will add: In the case where EM is being proved with a single question, this should be documented. One concern I have with the model organisms of EM paper, is that some of these models are more narrowly misaligned (like your “gender roles” example) but the paper only reports aggregate rates. Some readers will assume that if models are labeled as 10% EM, they are more broadly misaligned than this.