Nice C. Ineza

Karma: −1

Nice C. Ineza’s Shortform

Nice C. Ineza27 Mar 2026 20:35 UTC

1 point

1 comment1 min readLW link

Nice C. Ineza 27 Mar 2026 18:17 UTC
0 points
0
on: Nice C. Ineza’s Shortform
I am increasingly inclined to look into treating model representations as directions in activation space rtaher than individuals neurons is where maybe i can uncover more on mech. interp.
Wondering if there could be ″feature directions”that is corresponding to when a model could go nuts or just to an unsafe code generation or jailbreak like behavior.
Geometry could be our solution, just a thought!