RSS

Tomek Korbak

Karma: 1,144

I work on monitoring agents at OpenAI

https://​​tomekkorbak.com/​​

Rea­son­ing Models Strug­gle to Con­trol Their Chains of Thought

5 Mar 2026 22:37 UTC
74 points
8 comments3 min readLW link

Train­ing Agents to Self-Re­port Misbehavior

25 Feb 2026 17:50 UTC
26 points
0 comments8 min readLW link

Les­sons from Study­ing Two-Hop La­tent Reasoning

11 Sep 2025 17:53 UTC
68 points
19 comments2 min readLW link
(arxiv.org)

If you can gen­er­ate obfus­cated chain-of-thought, can you mon­i­tor it?

4 Aug 2025 15:46 UTC
36 points
6 comments11 min readLW link

Re­search Areas in AI Con­trol (The Align­ment Pro­ject by UK AISI)

1 Aug 2025 10:27 UTC
25 points
0 comments18 min readLW link
(alignmentproject.aisi.gov.uk)