RSS

TurnTrout

Karma: 21,911

I don’t use LessWrong much anymore. Find me at www.turntrout.com.

My name is Alex Turner. I’m a research scientist at Google DeepMind on the Scalable Alignment team. My views are strictly my own; I do not represent Google. Reach me at alex[at]turntrout.com

Team Shard: Align­ment Men­tor­ship from TurnTrout and Alex Cloud

26 Dec 2025 17:20 UTC
37 points
0 comments2 min readLW link
(turntrout.com)

2025-Era “Re­ward Hack­ing” Does Not Show that Re­ward Is the Op­ti­miza­tion Target

TurnTrout19 Dec 2025 6:09 UTC
46 points
9 comments7 min readLW link
(turntrout.com)

Au­to­matic alt text generation

TurnTrout22 Nov 2025 17:57 UTC
27 points
1 comment1 min readLW link
(turntrout.com)

[Paper] Out­put Su­per­vi­sion Can Obfus­cate the CoT

20 Nov 2025 22:41 UTC
75 points
3 comments5 min readLW link
(arxiv.org)

GDM: Con­sis­tency Train­ing Helps Limit Sy­co­phancy and Jailbreaks in Gem­ini 2.5 Flash

4 Nov 2025 16:25 UTC
51 points
2 comments6 min readLW link
(arxiv.org)

An Opinionated Guide to Pri­vacy De­spite Authoritarianism

TurnTrout29 Oct 2025 20:32 UTC
180 points
28 comments4 min readLW link
(turntrout.com)