RSS

Rohin Shah(Rohin Shah)

Karma: 14,516

Research Scientist at DeepMind. Creator of the Alignment Newsletter. http://​​rohinshah.com/​​

On scal­able over­sight with weak LLMs judg­ing strong LLMs

8 Jul 2024 8:59 UTC
48 points
18 comments7 min readLW link
(arxiv.org)

Im­prov­ing Dic­tionary Learn­ing with Gated Sparse Autoencoders

25 Apr 2024 18:43 UTC
63 points
38 comments1 min readLW link
(arxiv.org)