Beth Barnes comments on Thoughts on sharing information about language model capabilities

Beth Barnes 7 Aug 2023 2:06 UTC
LW: 6 AF: 5
4
AF
What we’ve currently published is ‘number of agents that completed each task’, which has a similar effect of making comparisons between models harder—does that seem like it addresses the downside sufficiently?