Seems worth noting that the ECI seems like it might be biased away from the ways that Claude is good; as per this post by Epoch, the first two PCs of their benchmark data correspond to “general capability” and “claudiness”, so ECI (which is another, but different, 1-dimensional compression of their benchmark data) seems like it should also underrate Claude.
Seems worth noting that the ECI seems like it might be biased away from the ways that Claude is good; as per this post by Epoch, the first two PCs of their benchmark data correspond to “general capability” and “claudiness”, so ECI (which is another, but different, 1-dimensional compression of their benchmark data) seems like it should also underrate Claude.
h/t @jake_mendel for discussion
Yes, this makes sense.
h/t @jake_mendel for discussion