RSS

Rohin Shah(Rohin Shah)

Karma: 14,147

Research Scientist at DeepMind. Creator of the Alignment Newsletter. http://​​rohinshah.com/​​

AtP*: An effi­cient and scal­able method for lo­cal­iz­ing LLM be­havi­our to components

18 Mar 2024 17:28 UTC
9 points
0 comments1 min readLW link
(arxiv.org)

Fact Find­ing: Do Early Lay­ers Spe­cial­ise in Lo­cal Pro­cess­ing? (Post 5)

23 Dec 2023 2:46 UTC
18 points
0 comments4 min readLW link

Fact Find­ing: How to Think About In­ter­pret­ing Me­mori­sa­tion (Post 4)

23 Dec 2023 2:46 UTC
22 points
0 comments9 min readLW link

Fact Find­ing: Try­ing to Mechanis­ti­cally Un­der­stand­ing Early MLPs (Post 3)

23 Dec 2023 2:46 UTC
9 points
0 comments16 min readLW link

Fact Find­ing: Sim­plify­ing the Cir­cuit (Post 2)

23 Dec 2023 2:45 UTC
18 points
3 comments14 min readLW link

Fact Find­ing: At­tempt­ing to Re­v­erse-Eng­ineer Fac­tual Re­call on the Neu­ron Level (Post 1)

23 Dec 2023 2:44 UTC
106 points
4 comments22 min readLW link