I don’t know. Maybe dangerous capability evals don’t really matter. The usual story for them is that they inform companies about risks so the companies can respond appropriately. Good evals are better than nothing, but I don’t expect companies’ eval results to affect their safeguards or training/deployment decisions much in practice. (I think companies’ safety frameworks are quite weak or at least vague — a topic for another blogpost.) Maybe evals are helpful for informing other actors, like government, but I don’t really see it.
I don’t have a particular conclusion. I’m making aisafetyclaims.org because evals are a crucial part of companies’ preparations to be safe when models might have very dangerous capabilities, but I noticed that the companies are doing and interpreting evals poorly (and are being misleading about this, and aren’t getting better), and some experts are aware of this but nobody else has written it up yet.
Good evals are better than nothing, but I don’t expect companies’ eval results to affect their safeguards or training/deployment decisions much in practice.
This seems to be a bit circular.
Who gets to decide what is the threshold for “good evals” in the first place… and how is it communicated?
So what?
I don’t know. Maybe dangerous capability evals don’t really matter. The usual story for them is that they inform companies about risks so the companies can respond appropriately. Good evals are better than nothing, but I don’t expect companies’ eval results to affect their safeguards or training/deployment decisions much in practice. (I think companies’ safety frameworks are quite weak or at least vague — a topic for another blogpost.) Maybe evals are helpful for informing other actors, like government, but I don’t really see it.
I don’t have a particular conclusion. I’m making aisafetyclaims.org because evals are a crucial part of companies’ preparations to be safe when models might have very dangerous capabilities, but I noticed that the companies are doing and interpreting evals poorly (and are being misleading about this, and aren’t getting better), and some experts are aware of this but nobody else has written it up yet.
This seems to be a bit circular.
Who gets to decide what is the threshold for “good evals” in the first place… and how is it communicated?