angkul07 comments on Neural chameleons can(’t) hide from activation oracles

angkul07 22 Jan 2026 5:52 UTC
1 point
0
Notably, the oracle suffers more than the probes.
Probes are more selective, specific and less complex than oracle. So, my guess is that it is easy for neural chameleons to directly evade the probes and manipulate the activations. I agree that the “size” of oracle can be a reason as well. Another idea is maybe for oracles we required more robust neural chameleons.
- ceselder 22 Jan 2026 19:32 UTC
  1 point
  0
  Parent
  That was my hunch too, and it’s why I switched to gemma 3 27b. Would be interesting to run this experiment on llama 3.3 70b. Could do this with a simple code change.