RSS

TurnTrout

Karma: 21,252

I don’t use LessWrong much anymore. Find me at www.turntrout.com.

My name is Alex Turner. I’m a research scientist at Google DeepMind on the Scalable Alignment team. My views are strictly my own; I do not represent Google. Reach me at alex[at]turntrout.com

Train­ing a Re­ward Hacker De­spite Perfect Labels

14 Aug 2025 23:57 UTC
127 points
45 comments4 min readLW link

Op­ti­miz­ing The Fi­nal Out­put Can Obfus­cate CoT (Re­search Note)

30 Jul 2025 21:26 UTC
195 points
22 comments6 min readLW link

English writes num­bers backwards

TurnTrout25 Jul 2025 23:00 UTC
8 points
23 comments12 min readLW link
(turntrout.com)

We Built a Tool to Pro­tect Your Dataset From Sim­ple Scrapers

25 Jul 2025 5:44 UTC
55 points
9 comments3 min readLW link

A Sim­ple Ex­pla­na­tion of AGI Risk

TurnTrout1 Jul 2025 16:18 UTC
66 points
4 comments5 min readLW link
(turntrout.com)

Authors Have a Re­spon­si­bil­ity to Com­mu­ni­cate Clearly

TurnTrout1 Jul 2025 15:41 UTC
125 points
29 comments6 min readLW link
(turntrout.com)