RSS

StefanHex

Karma: 2,141

Stefan Heimersheim. Mechanistic interpretability & AI safety researcher, previously at FAR.AI and Apollo Research. The opinions expressed here are my own and do not necessarily reflect the views of my employer.

Find­ing fea­tures in Trans­form­ers: Con­trastive di­rec­tions elicit stronger low-level per­tur­ba­tion re­sponses than baselines

20 Mar 2026 21:09 UTC
34 points
1 comment6 min readLW link

Ac­ti­va­tion Plateaus: Where and How They Emerge

17 Oct 2025 5:48 UTC
37 points
0 comments8 min readLW link

Trans­form­ers Don’t Need Lay­erNorm at In­fer­ence Time: Im­pli­ca­tions for Interpretability

23 Jul 2025 14:55 UTC
31 points
0 comments7 min readLW link