Mr Beastly comments on An Alternate History of the Future, 2025-2040

Mr Beastly 28 Feb 2025 0:20 UTC
1 point
0
This increases the samples in SWE-Bench Verified significantly, with the number of test cases growing exponentially as the AI itself contributes to the benchmark’s expansion.
“We introduce SWE-Lancer, a benchmark of over 1,400 freelance software engineering tasks from Upwork...”—https://arxiv.org/abs/2502.12115