FYI: Jocelyn Qiaochu Chen and collaborators just this morning released VeriSoftBench, “a benchmark for repository-scale Lean 4 verification proofs”, which I think is a really compelling project and highly relevant to the benchmarks/evals section of my post above.
FYI: Jocelyn Qiaochu Chen and collaborators just this morning released VeriSoftBench, “a benchmark for repository-scale Lean 4 verification proofs”, which I think is a really compelling project and highly relevant to the benchmarks/evals section of my post above.