RSS

Logan Riggs

Karma: 2,404

In­ter­pret­ing Prefer­ence Models w/​ Sparse Autoencoders

1 Jul 2024 21:35 UTC
71 points
12 comments9 min readLW link