Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
AI Benchmarking
Tag
Last edit:
16 Jul 2023 14:12 UTC
by
rybolos
Relevant
New
Old
Broken Benchmark: MMLU
awg
29 Aug 2023 18:09 UTC
23
points
5
comments
1
min read
LW
link
(www.youtube.com)
Introducing REBUS: A Robust Evaluation Benchmark of Understanding Symbols
Arjun Panickssery
and
agg
15 Jan 2024 21:21 UTC
33
points
0
comments
1
min read
LW
link
LLM Psychometrics: A Speculative Approach to AI Safety
pskl
29 Jan 2024 18:38 UTC
3
points
4
comments
1
min read
LW
link
(pascal.cc)
MMLU’s Moral Scenarios Benchmark Doesn’t Measure What You Think it Measures
corey morris
27 Sep 2023 17:54 UTC
14
points
2
comments
4
min read
LW
link
(medium.com)
No comments.
Back to top