RSS

Thomas Kwa

Karma: 7,121

Member of technical staff at METR.

Previously: MIRI interp with Adrià and Jason → METR.

I have signed no contracts or agreements whose existence I cannot mention.

Claude, GPT, and Gem­ini All Strug­gle to Evade Monitors

Aug 6, 2025, 8:28 PM
61 points

22 votes

Overall karma indicates overall quality.

3 comments5 min readLW link

METR: How Does Time Hori­zon Vary Across Do­mains?

Jul 14, 2025, 4:13 PM
84 points

30 votes

Overall karma indicates overall quality.

8 comments14 min readLW link
(metr.org)

Ts­inghua pa­per: Does RL Really In­cen­tivize Rea­son­ing Ca­pac­ity in LLMs Beyond the Base Model?

Thomas KwaMay 5, 2025, 6:56 PM
69 points

36 votes

Overall karma indicates overall quality.

21 comments2 min readLW link
(arxiv.org)

Should CA, TX, OK, and LA merge into a gi­ant swing state, just for elec­tions?

Thomas KwaNov 6, 2024, 11:01 PM
115 points

60 votes

Overall karma indicates overall quality.

35 comments1 min readLW link

The mur­der­ous short­cut: a toy model of in­stru­men­tal convergence

Thomas KwaOct 2, 2024, 6:48 AM
37 points

17 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Good­hart in RL with KL: Appendix

Thomas KwaMay 18, 2024, 12:40 AM
12 points

5 votes

Overall karma indicates overall quality.

0 comments6 min readLW link

Catas­trophic Good­hart in RL with KL penalty

May 15, 2024, 12:58 AM
62 points

19 votes

Overall karma indicates overall quality.

10 comments7 min readLW link

[Question] Is a ran­dom box of gas pre­dictable af­ter 20 sec­onds?

Jan 24, 2024, 11:00 PM
38 points

14 votes

Overall karma indicates overall quality.

35 comments1 min readLW link

[Question] Will quan­tum ran­dom­ness af­fect the 2028 elec­tion?

Jan 24, 2024, 10:54 PM
66 points

26 votes

Overall karma indicates overall quality.

52 comments1 min readLW link

Thomas Kwa’s re­search journal

Nov 23, 2023, 5:11 AM
79 points

37 votes

Overall karma indicates overall quality.

1 comment6 min readLW link

Thomas Kwa’s MIRI re­search experience

Oct 2, 2023, 4:42 PM
174 points

94 votes

Overall karma indicates overall quality.

53 comments1 min readLW link

Catas­trophic Re­gres­sional Good­hart: Appendix

May 15, 2023, 12:10 AM
25 points

11 votes

Overall karma indicates overall quality.

1 comment9 min readLW link

When is Good­hart catas­trophic?

May 9, 2023, 3:59 AM
180 points

68 votes

Overall karma indicates overall quality.

30 comments8 min readLW link1 review

Challenge: con­struct a Gra­di­ent Hacker

Mar 9, 2023, 2:38 AM
39 points

18 votes

Overall karma indicates overall quality.

10 comments1 min readLW link

Failure modes in a shard the­ory al­ign­ment plan

Thomas KwaSep 27, 2022, 10:34 PM
26 points

12 votes

Overall karma indicates overall quality.

2 comments7 min readLW link

Utility func­tions and prob­a­bil­ities are entangled

Thomas KwaJul 26, 2022, 5:36 AM
15 points

7 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

Deriv­ing Con­di­tional Ex­pected Utility from Pareto-Effi­cient Decisions

Thomas KwaMay 5, 2022, 3:21 AM
24 points

8 votes

Overall karma indicates overall quality.

1 comment6 min readLW link

Most prob­lems don’t differ dra­mat­i­cally in tractabil­ity (un­der cer­tain as­sump­tions)

Thomas KwaMay 4, 2022, 12:05 AM
8 points

3 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

The case for turn­ing glowfic into Sequences

Thomas KwaApr 27, 2022, 6:58 AM
88 points

47 votes

Overall karma indicates overall quality.

29 comments5 min readLW link

[Question] (When) do high-di­men­sional spaces have lin­ear paths down to lo­cal min­ima?

Thomas KwaApr 22, 2022, 3:35 PM
12 points

5 votes

Overall karma indicates overall quality.

7 comments1 min readLW link