I don’t know whether it’s being tracked ongoing, since I left AISI nearly a year ago. But based on previous practice, I’d guess yes (or a related/derivative suite), because the standard practice at AISI was for workstreams to maintain suites of evals and run them periodically and on prerelease models, occasionally publishing things in a sort of random-ish way.
I don’t know whether it’s being tracked ongoing, since I left AISI nearly a year ago. But based on previous practice, I’d guess yes (or a related/derivative suite), because the standard practice at AISI was for workstreams to maintain suites of evals and run them periodically and on prerelease models, occasionally publishing things in a sort of random-ish way.