Probes are more selective, specific and less complex than oracle. So, my guess is that it is easy for neural chameleons to directly evade the probes and manipulate the activations. I agree that the “size” of oracle can be a reason as well. Another idea is maybe for oracles we required more robust neural chameleons.
That was my hunch too, and it’s why I switched to gemma 3 27b. Would be interesting to run this experiment on llama 3.3 70b. Could do this with a simple code change.
Probes are more selective, specific and less complex than oracle. So, my guess is that it is easy for neural chameleons to directly evade the probes and manipulate the activations. I agree that the “size” of oracle can be a reason as well. Another idea is maybe for oracles we required more robust neural chameleons.
That was my hunch too, and it’s why I switched to gemma 3 27b. Would be interesting to run this experiment on llama 3.3 70b. Could do this with a simple code change.