My overall sense is that this behavioral testing will generally be hard. It will probably be a huge mess if we’re extremely rushed and need to do all of this in a few months
Why can’t we do a bunch of the work for this ahead of time? E.g., creating high-effort evaluation datasets for reward models.
Why can’t we do a bunch of the work for this ahead of time? E.g., creating high-effort evaluation datasets for reward models.