evhub comments on Towards understanding-based safety evaluations

evhub 19 Mar 2023 5:26 UTC
4 points
0
This looks basically right, except:

These understanding-evals would focus on how well we can predict models’ behavior

I definitely don’t think this—I explicitly talk about my problems with prediction-based evaluations in the post.
- Aaron_Scher 19 Mar 2023 7:30 UTC
  1 point
  2
  Parent
  Thanks for the correction. I edited my original comment to reflect it.