RSS

David Udell

Karma: 2,592

(Not) Ex­plain­ing GPT-2-Small For­ward Passes with Edge-Level Au­toen­coder Circuits

Jul 22, 2025, 8:36 PM
23 points

8 votes

Overall karma indicates overall quality.

0 comments6 min readLW link

Why Can’t We Hy­poth­e­size After the Fact?

David UdellFeb 26, 2025, 10:41 PM
40 points

15 votes

Overall karma indicates overall quality.

3 comments2 min readLW link

Causal Graphs of GPT-2-Small’s Resi­d­ual Stream

David UdellJul 9, 2024, 10:06 PM
53 points

18 votes

Overall karma indicates overall quality.

7 comments7 min readLW link

Sparse Cod­ing, for Mechanis­tic In­ter­pretabil­ity and Ac­ti­va­tion Engineering

David UdellSep 23, 2023, 7:16 PM
42 points

19 votes

Overall karma indicates overall quality.

7 comments34 min readLW link

Ac­tAdd: Steer­ing Lan­guage Models with­out Optimization

Sep 6, 2023, 5:21 PM
105 points

31 votes

Overall karma indicates overall quality.

3 comments2 min readLW link
(arxiv.org)

Steer­ing GPT-2-XL by adding an ac­ti­va­tion vector

May 13, 2023, 6:42 PM
439 points

206 votes

Overall karma indicates overall quality.

98 comments50 min readLW link1 review

Un­der­stand­ing and con­trol­ling a maze-solv­ing policy network

Mar 11, 2023, 6:59 PM
334 points

130 votes

Overall karma indicates overall quality.

28 comments23 min readLW link

Be­neath My Epistemic Dignity

David UdellFeb 28, 2023, 4:02 AM
6 points

10 votes

Overall karma indicates overall quality.

3 comments2 min readLW link

Prob­a­bil­ity The­ory: The Logic of Science, Jaynes

David UdellFeb 16, 2023, 9:57 PM
29 points

13 votes

Overall karma indicates overall quality.

0 comments18 min readLW link

Round­ing Some­one Off

David UdellJan 24, 2023, 12:03 AM
25 points

10 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

Con­se­quen­tial­ists: One-Way Pat­tern Traps

David UdellJan 16, 2023, 8:48 PM
59 points

28 votes

Overall karma indicates overall quality.

3 comments14 min readLW link

Lin­ear Alge­bra Done Right, Axler

David UdellJan 2, 2023, 10:54 PM
57 points

23 votes

Overall karma indicates overall quality.

6 comments9 min readLW link

Naive Set The­ory, Halmos

David UdellDec 22, 2022, 2:34 AM
11 points

4 votes

Overall karma indicates overall quality.

1 comment8 min readLW link

Moorean Statements

David UdellOct 22, 2022, 12:50 AM
11 points

4 votes

Overall karma indicates overall quality.

11 comments1 min readLW link

Dath Ilan’s Views on Stop­gap Corrigibility

David UdellSep 22, 2022, 4:16 PM
78 points

28 votes

Overall karma indicates overall quality.

19 comments13 min readLW link
(www.glowfic.com)

Guidelines for Mad Entrepreneurs

David UdellSep 16, 2022, 6:33 AM
31 points

13 votes

Overall karma indicates overall quality.

0 comments11 min readLW link

Fram­ing AI Childhoods

David UdellSep 6, 2022, 11:40 PM
37 points

13 votes

Overall karma indicates overall quality.

8 comments4 min readLW link

The Shard The­ory Align­ment Scheme

David UdellAug 25, 2022, 4:52 AM
47 points

18 votes

Overall karma indicates overall quality.

32 comments2 min readLW link

“What Mis­takes Are You Mak­ing Right Now?”

David UdellAug 15, 2022, 9:19 PM
13 points

5 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Shard The­ory: An Overview

David UdellAug 11, 2022, 5:44 AM
167 points

75 votes

Overall karma indicates overall quality.

34 comments10 min readLW link