First blood times represent the time of first successful submission in the originally published competition. While there are some limitations (participants usually compete in teams, and may solve problems in parallel or in sequence), this still provides a useful proxy.
Another limitation is that first blood times represent the fastest time of some group rather than the typical time that an expert would take to complete the task. This makes cybench times less comparable to other human completion times.
Yes I agree. And given these times were my primary anchor for other CTF estimates, this could be a significant contributor to why the SOTA time horizons I found were so much lower than METR found in their work with software engineering.
Another limitation is that first blood times represent the fastest time of some group rather than the typical time that an expert would take to complete the task. This makes cybench times less comparable to other human completion times.
Yes I agree. And given these times were my primary anchor for other CTF estimates, this could be a significant contributor to why the SOTA time horizons I found were so much lower than METR found in their work with software engineering.