RSS

METR (org)

TagLast edit: 1 Jul 2024 18:47 UTC by Ruby

Formerly ARC Evals

ARC Evals new re­port: Eval­u­at­ing Lan­guage-Model Agents on Real­is­tic Au­tonomous Tasks

Beth Barnes1 Aug 2023 18:30 UTC
153 points
12 comments5 min readLW link
(evals.alignment.org)

ARC Evals: Re­spon­si­ble Scal­ing Policies

Zach Stein-Perlman28 Sep 2023 4:30 UTC
40 points
9 comments2 min readLW link
(evals.alignment.org)

METR is hiring!

Beth Barnes26 Dec 2023 21:00 UTC
65 points
1 comment1 min readLW link

Re­view of METR’s pub­lic eval­u­a­tion protocol

30 Jun 2024 22:03 UTC
10 points
0 comments5 min readLW link
No comments.