RSS

Nina Rimsky

Karma: 1,162

https://​​ninarimsky.substack.com/​​

https://​​ninarimsky.com/​​

A fram­ing for interpretability

Nina Rimsky14 Nov 2023 16:14 UTC
66 points
5 comments4 min readLW link
(ninarimsky.substack.com)

Com­par­ing rep­re­sen­ta­tion vec­tors be­tween llama 2 base and chat

Nina Rimsky28 Oct 2023 22:54 UTC
33 points
4 comments2 min readLW link

In­ves­ti­gat­ing the learn­ing co­effi­cient of mod­u­lar ad­di­tion: hackathon project

17 Oct 2023 19:51 UTC
78 points
4 comments12 min readLW link

In­fluence func­tions—why, what and how

Nina Rimsky15 Sep 2023 20:42 UTC
64 points
5 comments8 min readLW link

Red-team­ing lan­guage mod­els via ac­ti­va­tion engineering

Nina Rimsky26 Aug 2023 5:52 UTC
61 points
5 comments9 min readLW link

The Low-Hang­ing Fruit Prior and sloped valleys in the loss landscape

23 Aug 2023 21:12 UTC
79 points
1 comment13 min readLW link

Un­der­stand­ing and vi­su­al­iz­ing syco­phancy datasets

Nina Rimsky16 Aug 2023 5:34 UTC
45 points
0 comments6 min readLW link