Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
AI Benchmarking
Tag
Last edit:
16 Jul 2023 14:12 UTC
by
rybolos
Relevant
New
Old
MMLU’s Moral Scenarios Benchmark Doesn’t Measure What You Think it Measures
corey morris
27 Sep 2023 17:54 UTC
14
points
2
comments
4
min read
LW
link
(medium.com)
Broken Benchmark: MMLU
awg
29 Aug 2023 18:09 UTC
25
points
5
comments
1
min read
LW
link
(www.youtube.com)
No comments.
Back to top