I think it would be pretty exciting to try to get evidence about how much of the METR horizon length progress is coming from intercept rising vs. slope increasing.
The main thing this would require is for them to report how long it takes for the AIs to complete the tasks they test them on, right? (As opposed to now when they report how long it takes the humans and only report pass rate for the AIs.)
The main thing this would require is for them to report how long it takes for the AIs to complete the tasks they test them on, right? (As opposed to now when they report how long it takes the humans and only report pass rate for the AIs.)