10.5-13% on text only part of HLE
This is for o3-mini, while the ~25% figure for o3 from the tweet you linked is simply restating deep research evals.
This is for o3-mini, while the ~25% figure for o3 from the tweet you linked is simply restating deep research evals.