Yeah, I tried running the code on SciCode, GPQA, and HLE. Overall, the results were somewhat similar but much more noisy. Using method 2 we got very similar results but with lower HLE growth. Using method 1: we got somewhat lower growth rates in SciCode and higher growth rates in GPQA diamond at the high end (but given the way I constructed the frontier, there were only two points in the 70+ bucket).
Yeah, I tried running the code on SciCode, GPQA, and HLE. Overall, the results were somewhat similar but much more noisy. Using method 2 we got very similar results but with lower HLE growth. Using method 1: we got somewhat lower growth rates in SciCode and higher growth rates in GPQA diamond at the high end (but given the way I constructed the frontier, there were only two points in the 70+ bucket).