RSS

StefanHex

Karma: 1,955

Stefan Heimersheim. Interpretability researcher at FAR.AI, previously Apollo Research. The opinions expressed here are my own and do not necessarily reflect the views of my employer.

Trans­form­ers Don’t Need Lay­erNorm at In­fer­ence Time: Im­pli­ca­tions for Interpretability

23 Jul 2025 14:55 UTC
31 points
0 comments7 min readLW link

Trusted mon­i­tor­ing, but with de­cep­tion probes.

23 Jul 2025 5:26 UTC
31 points
0 comments4 min readLW link
(arxiv.org)