Laura Domenech

Karma: 19

Research Fellow at the GPAI Policy Lab

Laura Domenech 11 Mar 2026 10:56 UTC
5 points
0
in reply to: Gordon Seidoh Worley’s comment on: Mapping AI Capabilities to Human Expertise on the Rosetta Stone (Epoch Capabilities Index)
Thanks for the comment! It’s an ongoing project so we are definitely open to suggestions.
Regarding your remark, on one hand it sure doesn’t mean that humans at these expertise levels can be replaced, see our note “These benchmarks are simplified proxies, not direct measures of real-world professional competence. Scoring at “Domain Expert level” on a benchmark does not mean models can replace domain experts in their actual work. [etc.]” and the TLDR acknowledges, perhaps too succinctly, that it only applies to “technical and scientific benchmark skills”
But on the other hand these are empirical baseline results from real experts who took these benchmarks (with limited time) and got worse results than leading models. So scoring at a Domain Expert level does mean a model matches or exceeds the accuracy of human experts on these specific, closed-ended sets of benchmark tasks.
For the next iteration we’d like to add more complex benchmarks and additional difficulty axes that should account for other factors like fluid intelligence and real-world messiness. This should help clarify that models are not exceeding experts on all axes.

Mapping AI Capabilities to Human Expertise on the Rosetta Stone (Epoch Capabilities Index)

Laura Domenech and Jérémy Andréoletti

9 Mar 2026 17:09 UTC

18 points

2 comments6 min readLW link

Laura Domenech 3 Mar 2026 14:09 UTC
2 points
0
on: Jailbreaking is Empirical Evidence for Inner Misalignment and Against Alignment by Default
Great post, thank you. It was helpful to understand the pattern patch → new jailbreak → repeat as supporting the claim that current models fail to really internalize reinforced human values.