Previously at Penn (Logic, Information, and Computation BA) and Safe AI @ Penn.
Currently spending the summer at ERA:AI Cambridge doing Technical AI Governance research in forecasting.
Previously at Penn (Logic, Information, and Computation BA) and Safe AI @ Penn.
Currently spending the summer at ERA:AI Cambridge doing Technical AI Governance research in forecasting.
Establishing Best Practices for Building Rigorous Agentic Benchmarks by Yuxuan Xhu et. al. July 14th, 2025 covers this problem for agentic benchmarks.