RSS

Sid Black

Karma: 772

The Sin­gu­lar Value De­com­po­si­tions of Trans­former Weight Ma­tri­ces are Highly Interpretable

28 Nov 2022 12:54 UTC
195 points
33 comments31 min readLW link

Con­jec­ture Se­cond Hiring Round

23 Nov 2022 17:11 UTC
92 points
0 comments1 min readLW link

Con­jec­ture: a ret­ro­spec­tive af­ter 8 months of work

23 Nov 2022 17:10 UTC
185 points
9 comments8 min readLW link

Cur­rent themes in mechanis­tic in­ter­pretabil­ity research

16 Nov 2022 14:14 UTC
89 points
2 comments12 min readLW link

In­ter­pret­ing Neu­ral Net­works through the Poly­tope Lens

23 Sep 2022 17:58 UTC
136 points
29 comments33 min readLW link

Con­jec­ture: In­ter­nal In­fo­haz­ard Policy

29 Jul 2022 19:07 UTC
131 points
6 comments19 min readLW link