RSS

Can

Karma: 184

Eval­u­at­ing Sparse Au­toen­coders with Board Game Models

2 Aug 2024 19:50 UTC
25 points
1 comment9 min readLW link

Othel­loGPT learned a bag of heuristics

2 Jul 2024 9:12 UTC
108 points
10 comments9 min readLW link

Past Tense Features

Can20 Apr 2024 14:34 UTC
12 points
0 comments4 min readLW link

An ad­ver­sar­ial ex­am­ple for Direct Logit At­tri­bu­tion: mem­ory man­age­ment in gelu-4l

30 Aug 2023 17:36 UTC
17 points
0 comments8 min readLW link
(arxiv.org)

Un­der­stand­ing mesa-op­ti­miza­tion us­ing toy models

7 May 2023 17:00 UTC
42 points
2 comments10 min readLW link

Safety of Self-Assem­bled Neu­ro­mor­phic Hardware

Can26 Dec 2022 18:51 UTC
15 points
2 comments10 min readLW link
(forum.effectivealtruism.org)