The ECI suggests that the best open-weight models train on ~1 OOM less compute than the best closed weight ones. Wonder what to make of this if at all.
I would guess that this is mainly due to there being much more limited FLOP data for the closed models (especially for recent models), and for closed models focusing much less on small training FLOP models (eg <1e25 FLOP)
The ECI suggests that the best open-weight models train on ~1 OOM less compute than the best closed weight ones. Wonder what to make of this if at all.
I would guess that this is mainly due to there being much more limited FLOP data for the closed models (especially for recent models), and for closed models focusing much less on small training FLOP models (eg <1e25 FLOP)
Another consideration is that we tend to prioritize evaluations on frontier models, so coverage for smaller models is spottier.