Another suggestion: https://cybench.github.io/
The issue with Cybench is its difficulty annotations are “first solve time” which we don’t know how to compare with median / average solve time among experts. If we get better data, it could be comparable.
Another suggestion: https://cybench.github.io/
The issue with Cybench is its difficulty annotations are “first solve time” which we don’t know how to compare with median / average solve time among experts. If we get better data, it could be comparable.