AI eval idea: metabench. Make each LLM autonomously design and build a benchmark. Then run these benchmarks for all participants and sum the results. Compare with external benchmarks too.
The name metabench is already taken!
AI eval idea: metabench. Make each LLM autonomously design and build a benchmark. Then run these benchmarks for all participants and sum the results. Compare with external benchmarks too.
The name metabench is already taken!