This looks basically right, except:
These understanding-evals would focus on how well we can predict models’ behavior
I definitely don’t think this—I explicitly talk about my problems with prediction-based evaluations in the post.
Thanks for the correction. I edited my original comment to reflect it.
This looks basically right, except:
I definitely don’t think this—I explicitly talk about my problems with prediction-based evaluations in the post.
Thanks for the correction. I edited my original comment to reflect it.