RSS

Adam Karvonen

Karma: 286

Eval­u­at­ing Sparse Au­toen­coders with Board Game Models

2 Aug 2024 19:50 UTC
38 points
1 comment9 min readLW link

Us­ing an LLM per­plex­ity filter to de­tect weight exfiltration

Adam Karvonen21 Jul 2024 18:18 UTC
25 points
11 comments2 min readLW link

Othel­loGPT learned a bag of heuristics

2 Jul 2024 9:12 UTC
108 points
10 comments9 min readLW link

An In­tu­itive Ex­pla­na­tion of Sparse Au­toen­coders for Mechanis­tic In­ter­pretabil­ity of LLMs

Adam Karvonen25 Jun 2024 15:57 UTC
25 points
0 comments9 min readLW link
(adamkarvonen.github.io)

A Chess-GPT Lin­ear Emer­gent World Representation

Adam Karvonen8 Feb 2024 4:25 UTC
102 points
14 comments7 min readLW link
(adamkarvonen.github.io)