RSS

submarat

Karma: 51

SWE/​MLE/​AI Safety

Trans­form­ers Don’t Need Lay­erNorm at In­fer­ence Time: Im­pli­ca­tions for Interpretability

23 Jul 2025 14:55 UTC
31 points
0 comments7 min readLW link

ARENA4.0 Cap­stone: Hyper­pa­ram­e­ter tun­ing for MELBO + repli­ca­tion on Llama-3.2-1b-Instruct

5 Oct 2024 11:30 UTC
34 points
2 comments8 min readLW link