RSS

Rohin Shah(Rohin Shah)

Karma: 14,372

Research Scientist at DeepMind. Creator of the Alignment Newsletter. http://​​rohinshah.com/​​

Im­prov­ing Dic­tionary Learn­ing with Gated Sparse Autoencoders

25 Apr 2024 18:43 UTC
62 points
35 comments1 min readLW link
(arxiv.org)

AtP*: An effi­cient and scal­able method for lo­cal­iz­ing LLM be­havi­our to components

18 Mar 2024 17:28 UTC
19 points
0 comments1 min readLW link
(arxiv.org)