RSS

StefanHex

Karma: 1,994

Stefan Heimersheim. Interpretability researcher at FAR.AI, previously Apollo Research. The opinions expressed here are my own and do not necessarily reflect the views of my employer.

Ac­ti­va­tion Plateaus: Where and How They Emerge

17 Oct 2025 5:48 UTC
36 points
0 comments8 min readLW link

Trans­form­ers Don’t Need Lay­erNorm at In­fer­ence Time: Im­pli­ca­tions for Interpretability

23 Jul 2025 14:55 UTC
31 points
0 comments7 min readLW link

Trusted mon­i­tor­ing, but with de­cep­tion probes.

23 Jul 2025 5:26 UTC
31 points
0 comments4 min readLW link
(arxiv.org)