RSS

StefanHex

Karma: 540

Stefan Heimersheim. Research Scientist at Apollo Research, Mechanistic Interpretability.

CNN fea­ture vi­su­al­iza­tion in 50 lines of code

StefanHex26 May 2022 11:02 UTC
17 points
4 comments5 min readLW link

Re­search Ques­tions from Stained Glass Windows

StefanHex8 Jun 2022 12:38 UTC
4 points
0 comments2 min readLW link

Re­in­force­ment Learn­ing Goal Mis­gen­er­al­iza­tion: Can we guess what kind of goals are se­lected by de­fault?

25 Oct 2022 20:48 UTC
14 points
2 comments4 min readLW link

How-to Trans­former Mechanis­tic In­ter­pretabil­ity—in 50 lines of code or less!

StefanHex24 Jan 2023 18:45 UTC
47 points
5 comments13 min readLW link

A cir­cuit for Python doc­strings in a 4-layer at­ten­tion-only transformer

20 Feb 2023 19:35 UTC
91 points
6 comments21 min readLW link

Resi­d­ual stream norms grow ex­po­nen­tially over the for­ward pass

7 May 2023 0:46 UTC
72 points
24 comments11 min readLW link