Thomas Kwa comments on METR: How Does Time Horizon Vary Across Domains?

Thomas Kwa 21 Jul 2025 18:54 UTC
2 points
1
The issue with Cybench is its difficulty annotations are “first solve time” which we don’t know how to compare with median / average solve time among experts. If we get better data, it could be comparable.