millicentli

Karma: 0

millicentli 9 Apr 2026 19:06 UTC
1 point
0
on: Current activation oracles are hard to use
We show some preliminary work towards some of the issues for activation-oracle (verbalization) type work here, which you all might be interested in! https://arxiv.org/abs/2509.13316 (some of the inversion behavior + the verbalizer being an lm itself was something we stress tested rigorously).