Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
submarat
Karma:
51
SWE/MLE/AI Safety
All
Posts
Comments
New
Top
Old
Transformers Don’t Need LayerNorm at Inference Time: Implications for Interpretability
submarat
,
Joachim Schaeffer
,
Luca Baroni
,
galvsk
and
StefanHex
23 Jul 2025 14:55 UTC
31
points
0
comments
7
min read
LW
link
ARENA4.0 Capstone: Hyperparameter tuning for MELBO + replication on Llama-3.2-1b-Instruct
25Hour
and
submarat
5 Oct 2024 11:30 UTC
34
points
2
comments
8
min read
LW
link
Back to top