METR (org)

TagLast edit: 1 Jul 2024 18:47 UTC by Ruby

Formerly ARC Evals

Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity

habryka11 Jul 2025 0:23 UTC

97 points

43 comments6 min readLW link

(metr.org)

Review of METR’s public evaluation protocol

nahoj and JaimeRV

30 Jun 2024 22:03 UTC

10 points

0 comments5 min readLW link

METR’s Observations of Reward Hacking in Recent Frontier Models

Daniel Kokotajlo9 Jun 2025 18:03 UTC

100 points

9 comments11 min readLW link

(metr.org)

AXRP Episode 47 - David Rein on METR Time Horizons

DanielFilan3 Jan 2026 0:10 UTC

21 points

0 comments46 min readLW link

Interpreting the METR Time Horizons Post

snewman30 Apr 2025 3:03 UTC

70 points

13 comments10 min readLW link

(amistrongeryet.substack.com)

Improved visualizations of METR Time Horizons paper.

LDJ19 Mar 2025 23:36 UTC

30 points

4 comments2 min readLW link

METR’s Evaluation of GPT-5

GradientDissenter7 Aug 2025 22:17 UTC

145 points

15 comments20 min readLW link

(metr.github.io)

ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks

Beth Barnes1 Aug 2023 18:30 UTC

153 points

12 comments5 min readLW link

(evals.alignment.org)

METR is hiring ML Research Engineers and Scientists

Xodarap5 Jun 2024 21:27 UTC

5 points

0 comments1 min readLW link

(metr.org)

CoT May Be Highly Informative Despite “Unfaithfulness” [METR]

GradientDissenter11 Aug 2025 21:47 UTC

64 points

3 comments24 min readLW link

(metr.org)

METR is hiring!

Beth Barnes26 Dec 2023 21:00 UTC

65 points

1 comment1 min readLW link

Estimating METR Time Horizons for Claude Opus 4.6 and GPT 5.3 Codex (xhigh)

CharlesD16 Feb 2026 18:14 UTC

33 points

6 comments3 min readLW link

You’re gonna need a bigger boat (benchmark), METR

Eye You and frmsaul

13 Apr 2026 2:55 UTC

20 points

7 comments3 min readLW link

Clarifying METR’s Auditing Role

Beth Barnes30 May 2024 18:41 UTC

108 points

1 comment2 min readLW link

METR: Measuring AI Ability to Complete Long Tasks

Zach Stein-Perlman19 Mar 2025 16:00 UTC

243 points

106 comments5 min readLW link

(metr.org)

Is METR Underestimating LLM Time Horizons?

andreasrobinson18 Jan 2026 1:19 UTC

40 points

6 comments17 min readLW link

Why Future AIs will Require New Alignment Methods

Alvin Ånestrand10 Oct 2025 14:27 UTC

17 points

7 comments5 min readLW link

(forecastingaifutures.substack.com)

[Question] How far along Metr’s law can AI start automating or helping with alignment research?

Christopher King20 Mar 2025 15:58 UTC

20 points

21 comments1 min readLW link

Introducing METR’s Autonomy Evaluation Resources

Megan Kinniment and Beth Barnes

15 Mar 2024 23:16 UTC

90 points

0 comments1 min readLW link

(metr.github.io)

METR’s preliminary evaluation of o3 and o4-mini

Christopher King16 Apr 2025 20:23 UTC

14 points

7 comments1 min readLW link

(metr.github.io)

Reactions to METR task length paper are insane

Cole Wyeth10 Apr 2025 17:13 UTC

67 points

43 comments4 min readLW link

METR have released Time Horizons 1.1

Sean Herrington3 Feb 2026 19:48 UTC

33 points

0 comments1 min readLW link

(metr.org)

The Prompt Is the Tell, Not the Reasoning Trace—Eval Awareness

ratnaditya@gmail.com18 May 2026 5:12 UTC

1 point

0 comments8 min readLW link

METR: AI models can be dangerous before public deployment

UnofficialLinkpostBot26 Feb 2025 20:19 UTC

16 points

0 comments3 min readLW link

(metr.org)

ARC Evals: Responsible Scaling Policies

Zach Stein-Perlman28 Sep 2023 4:30 UTC

40 points

10 comments2 min readLW link 1 review

(evals.alignment.org)

No comments.