OSWorld isn’t in machine learning or mathematics, so we don’t have much data to go on.
But what we do have suggests ~4 month doubling time from which we arrive at an ~8 minute 50% time horizon by EOY, Given: > # Difficulty Split: Easy (<60s): 28.72%, Medium (60-180s): 40.11%, Hard (>180s): 30.17%
This does suggest greater than 80% by EOY, but this depends on model release cadence etc.
OSWorld isn’t in machine learning or mathematics, so we don’t have much data to go on.
But what we do have suggests ~4 month doubling time from which we arrive at an ~8 minute 50% time horizon by EOY, Given:
> # Difficulty Split: Easy (<60s): 28.72%, Medium (60-180s): 40.11%, Hard (>180s): 30.17%
This does suggest greater than 80% by EOY, but this depends on model release cadence etc.