RSS

TurnTrout

Karma: 17,401

My name is Alex Turner. I’m a research scientist at Google DeepMind on the Scalable Alignment team. My views are strictly my own; I do not represent Google. Reach me at alex[at]turntrout.com

Many ar­gu­ments for AI x-risk are wrong

TurnTrout5 Mar 2024 2:31 UTC
151 points
76 comments12 min readLW link

Dreams of AI al­ign­ment: The dan­ger of sug­ges­tive names

TurnTrout10 Feb 2024 1:22 UTC
92 points
58 comments4 min readLW link

Steer­ing Llama-2 with con­trastive ac­ti­va­tion additions

2 Jan 2024 0:47 UTC
118 points
29 comments8 min readLW link
(arxiv.org)

How should TurnTrout han­dle his Deep­Mind equity situ­a­tion?

16 Oct 2023 18:25 UTC
61 points
27 comments6 min readLW link

Paper: Un­der­stand­ing and Con­trol­ling a Maze-Solv­ing Policy Network

13 Oct 2023 1:38 UTC
69 points
0 comments1 min readLW link
(arxiv.org)

AI pres­i­dents dis­cuss AI al­ign­ment agendas

9 Sep 2023 18:55 UTC
213 points
22 comments1 min readLW link
(www.youtube.com)

Ac­tAdd: Steer­ing Lan­guage Models with­out Optimization

6 Sep 2023 17:21 UTC
105 points
3 comments2 min readLW link
(arxiv.org)

Open prob­lems in ac­ti­va­tion engineering

24 Jul 2023 19:46 UTC
43 points
2 comments1 min readLW link
(coda.io)

Ban de­vel­op­ment of un­pre­dictable pow­er­ful mod­els?

TurnTrout20 Jun 2023 1:43 UTC
46 points
25 comments4 min readLW link

Mode col­lapse in RL may be fueled by the up­date equation

19 Jun 2023 21:51 UTC
49 points
10 comments8 min readLW link

Think care­fully be­fore call­ing RL poli­cies “agents”

TurnTrout2 Jun 2023 3:46 UTC
124 points
35 comments4 min readLW link

Steer­ing GPT-2-XL by adding an ac­ti­va­tion vector

13 May 2023 18:42 UTC
416 points
97 comments50 min readLW link

Resi­d­ual stream norms grow ex­po­nen­tially over the for­ward pass

7 May 2023 0:46 UTC
72 points
24 comments11 min readLW link

Be­havi­oural statis­tics for a maze-solv­ing agent

20 Apr 2023 22:26 UTC
44 points
11 comments10 min readLW link

[April Fools’] Defini­tive con­fir­ma­tion of shard theory

TurnTrout1 Apr 2023 7:27 UTC
166 points
7 comments2 min readLW link

Maze-solv­ing agents: Add a top-right vec­tor, make the agent go to the top-right

31 Mar 2023 19:20 UTC
101 points
17 comments11 min readLW link

Un­der­stand­ing and con­trol­ling a maze-solv­ing policy network

11 Mar 2023 18:59 UTC
312 points
22 comments23 min readLW link

Pre­dic­tions for shard the­ory mechanis­tic in­ter­pretabil­ity results

1 Mar 2023 5:16 UTC
105 points
10 comments5 min readLW link

Para­met­ri­cally re­tar­getable de­ci­sion-mak­ers tend to seek power

TurnTrout18 Feb 2023 18:41 UTC
166 points
9 comments2 min readLW link
(arxiv.org)

Some of my dis­agree­ments with List of Lethalities

TurnTrout24 Jan 2023 0:25 UTC
68 points
7 comments10 min readLW link