RSS

Arthur Conmy

Karma: 954

Intepretability

Views my own

[Full Post] Progress Up­date #1 from the GDM Mech In­terp Team

19 Apr 2024 19:06 UTC
69 points
8 comments8 min readLW link

[Sum­mary] Progress Up­date #1 from the GDM Mech In­terp Team

19 Apr 2024 19:06 UTC
66 points
0 comments3 min readLW link

We In­spected Every Head In GPT-2 Small us­ing SAEs So You Don’t Have To

6 Mar 2024 5:03 UTC
56 points
0 comments12 min readLW link

At­ten­tion SAEs Scale to GPT-2 Small

3 Feb 2024 6:50 UTC
76 points
4 comments8 min readLW link

Sparse Au­toen­coders Work on At­ten­tion Layer Outputs

16 Jan 2024 0:26 UTC
82 points
5 comments19 min readLW link

My best guess at the im­por­tant tricks for train­ing 1L SAEs

Arthur Conmy21 Dec 2023 1:59 UTC
35 points
4 comments3 min readLW link

[Paper] All’s Fair In Love And Love: Copy Sup­pres­sion in GPT-2 Small

13 Oct 2023 18:32 UTC
82 points
4 comments8 min readLW link

Three ways in­ter­pretabil­ity could be impactful

Arthur Conmy18 Sep 2023 1:02 UTC
47 points
8 comments4 min readLW link

Mechanis­ti­cally in­ter­pret­ing time in GPT-2 small

16 Apr 2023 17:57 UTC
68 points
6 comments21 min readLW link

RLHF does not ap­pear to differ­en­tially cause mode-collapse

20 Mar 2023 15:39 UTC
95 points
9 comments3 min readLW link

OpenAI in­tro­duce ChatGPT API at 1/​10th the pre­vi­ous $/​token

Arthur Conmy1 Mar 2023 20:48 UTC
28 points
4 comments1 min readLW link
(openai.com)

Arthur Conmy’s Shortform

Arthur Conmy1 Nov 2022 21:35 UTC
2 points
1 comment1 min readLW link

Some Les­sons Learned from Study­ing Indi­rect Ob­ject Iden­ti­fi­ca­tion in GPT-2 small

28 Oct 2022 23:55 UTC
99 points
9 comments9 min readLW link2 reviews
(arxiv.org)