RSS

Hoagy

Karma: 900

Some ad­di­tional SAE thoughts

Hoagy13 Jan 2024 19:31 UTC
28 points
4 comments13 min readLW link

Sparse Au­toen­coders Find Highly In­ter­pretable Direc­tions in Lan­guage Models

21 Sep 2023 15:30 UTC
154 points
7 comments5 min readLW link

Au­toIn­ter­pre­ta­tion Finds Sparse Cod­ing Beats Alternatives

Hoagy17 Jul 2023 1:41 UTC
54 points
1 comment7 min readLW link

[Repli­ca­tion] Con­jec­ture’s Sparse Cod­ing in Small Transformers

16 Jun 2023 18:02 UTC
52 points
0 comments5 min readLW link

[Repli­ca­tion] Con­jec­ture’s Sparse Cod­ing in Toy Models

2 Jun 2023 17:34 UTC
23 points
0 comments1 min readLW link

Univer­sal­ity and Hid­den In­for­ma­tion in Con­cept Bot­tle­neck Models

Hoagy5 Apr 2023 14:00 UTC
23 points
0 comments11 min readLW link

No­kens: A po­ten­tial method of in­ves­ti­gat­ing glitch tokens

Hoagy15 Mar 2023 16:23 UTC
20 points
0 comments4 min readLW link

Au­tomat­ing Consistency

Hoagy17 Feb 2023 13:24 UTC
10 points
0 comments1 min readLW link

Distil­led Rep­re­sen­ta­tions Re­search Agenda

18 Oct 2022 20:59 UTC
15 points
2 comments8 min readLW link

Re­mak­ing Effi­cien­tZero (as best I can)

Hoagy4 Jul 2022 11:03 UTC
36 points
9 comments22 min readLW link

Note-Tak­ing with­out Hid­den Messages

Hoagy30 Apr 2022 11:15 UTC
17 points
2 comments4 min readLW link

ELK Sub—Note-tak­ing in in­ter­nal rollouts

Hoagy9 Mar 2022 17:23 UTC
6 points
0 comments5 min readLW link

Au­to­mated Fact Check­ing: A Look at the Field

Hoagy6 Oct 2021 23:52 UTC
12 points
0 comments8 min readLW link

Hoagy’s Shortform

Hoagy21 Sep 2020 22:00 UTC
3 points
12 comments1 min readLW link

Safe Scram­bling?

Hoagy29 Aug 2020 14:31 UTC
3 points
1 comment2 min readLW link

When do util­ity func­tions con­strain?

Hoagy23 Aug 2019 17:19 UTC
29 points
7 comments7 min readLW link