RSS

AI Benchmarking

TagLast edit: 16 Jul 2023 14:12 UTC by rybolos

MMLU’s Mo­ral Sce­nar­ios Bench­mark Doesn’t Mea­sure What You Think it Measures

corey morris27 Sep 2023 17:54 UTC
14 points
2 comments4 min readLW link
(medium.com)

Bro­ken Bench­mark: MMLU

awg29 Aug 2023 18:09 UTC
25 points
5 comments1 min readLW link
(www.youtube.com)
No comments.