SWEBench Verified—Anthropic claims 80.2% with parallel test time compute, I think we should count it?
SWEBench Verified—Anthropic claims 80.2% with parallel test time compute, I think we should count it?