RSS

János Kramár(János Kramár)

Karma: 206

AtP*: An effi­cient and scal­able method for lo­cal­iz­ing LLM be­havi­our to components

18 Mar 2024 17:28 UTC
19 points
0 comments1 min readLW link
(arxiv.org)

Fact Find­ing: Do Early Lay­ers Spe­cial­ise in Lo­cal Pro­cess­ing? (Post 5)

23 Dec 2023 2:46 UTC
18 points
0 comments4 min readLW link

Fact Find­ing: How to Think About In­ter­pret­ing Me­mori­sa­tion (Post 4)

23 Dec 2023 2:46 UTC
22 points
0 comments9 min readLW link

Fact Find­ing: Try­ing to Mechanis­ti­cally Un­der­stand­ing Early MLPs (Post 3)

23 Dec 2023 2:46 UTC
9 points
0 comments16 min readLW link

Fact Find­ing: Sim­plify­ing the Cir­cuit (Post 2)

23 Dec 2023 2:45 UTC
18 points
3 comments14 min readLW link

Fact Find­ing: At­tempt­ing to Re­v­erse-Eng­ineer Fac­tual Re­call on the Neu­ron Level (Post 1)

23 Dec 2023 2:44 UTC
106 points
4 comments22 min readLW link

Does Cir­cuit Anal­y­sis In­ter­pretabil­ity Scale? Ev­i­dence from Mul­ti­ple Choice Ca­pa­bil­ities in Chinchilla

20 Jul 2023 10:50 UTC
43 points
3 comments2 min readLW link
(arxiv.org)

In­finite Mo­dal Com­bat: some observations

János Kramár29 Jul 2015 4:05 UTC
3 points
0 comments3 min readLW link

A tractable, in­ter­pretable for­mu­la­tion of ap­prox­i­mate con­di­tion­ing for pair­wise-speci­fied prob­a­bil­ity dis­tri­bu­tions over truth values

János Kramár3 Jun 2015 19:08 UTC
3 points
3 comments2 min readLW link