RSS

StefanHex

Karma: 546

Stefan Heimersheim. Research Scientist at Apollo Research, Mechanistic Interpretability.

How to use and in­ter­pret ac­ti­va­tion patching

24 Apr 2024 8:35 UTC
9 points
0 comments18 min readLW link

Poly­se­man­tic At­ten­tion Head in a 4-Layer Transformer

9 Nov 2023 16:16 UTC
46 points
0 comments6 min readLW link

Solv­ing the Mechanis­tic In­ter­pretabil­ity challenges: EIS VII Challenge 2

25 May 2023 15:37 UTC
71 points
1 comment13 min readLW link

Solv­ing the Mechanis­tic In­ter­pretabil­ity challenges: EIS VII Challenge 1

9 May 2023 19:41 UTC
119 points
1 comment10 min readLW link

Resi­d­ual stream norms grow ex­po­nen­tially over the for­ward pass

7 May 2023 0:46 UTC
72 points
24 comments11 min readLW link