RSS

Andy Arditi

Karma: 633

https://​​andyrdt.com

Do mod­els say what they learn?

22 Mar 2025 15:19 UTC
115 points
12 comments13 min readLW link

Find­ing Fea­tures Causally Up­stream of Refusal

14 Jan 2025 2:30 UTC
48 points
5 comments12 min readLW link

AI as sys­tems, not just models

Andy Arditi21 Dec 2024 23:19 UTC
28 points
0 comments7 min readLW link
(andyrdt.com)

Un­learn­ing via RMU is mostly shallow

23 Jul 2024 16:07 UTC
54 points
3 comments6 min readLW link