We show some preliminary work towards some of the issues for activation-oracle (verbalization) type work here, which you all might be interested in! https://arxiv.org/abs/2509.13316 (some of the inversion behavior + the verbalizer being an lm itself was something we stress tested rigorously).
We show some preliminary work towards some of the issues for activation-oracle (verbalization) type work here, which you all might be interested in! https://arxiv.org/abs/2509.13316 (some of the inversion behavior + the verbalizer being an lm itself was something we stress tested rigorously).