RSS

TurnTrout

Karma: 21,954

I don’t use LessWrong much anymore. Find me at www.turntrout.com.

My name is Alex Turner. I’m a research scientist at Google DeepMind on the Scalable Alignment team. My views are strictly my own; I do not represent Google. Reach me at alex[at]turntrout.com

Ap­ply for Align­ment Men­tor­ship from TurnTrout and Alex Cloud

26 Dec 2025 17:20 UTC
46 points
0 comments2 min readLW link
(turntrout.com)

2025-Era “Re­ward Hack­ing” Does Not Show that Re­ward Is the Op­ti­miza­tion Target

TurnTrout19 Dec 2025 6:09 UTC
46 points
9 comments7 min readLW link
(turntrout.com)

Au­to­matic alt text generation

TurnTrout22 Nov 2025 17:57 UTC
27 points
1 comment1 min readLW link
(turntrout.com)

[Paper] Out­put Su­per­vi­sion Can Obfus­cate the CoT

20 Nov 2025 22:41 UTC
75 points
3 comments5 min readLW link
(arxiv.org)

GDM: Con­sis­tency Train­ing Helps Limit Sy­co­phancy and Jailbreaks in Gem­ini 2.5 Flash

4 Nov 2025 16:25 UTC
53 points
2 comments6 min readLW link
(arxiv.org)