”Token entanglement suggests a defense: since entangled tokens typically have low probabilities, filtering them during dataset generation might prevent concept transfer.”
Can you explain that more?
I thought that, earlier in your experiments, the entangled numbers tokens were selected from numbers tokens appearing in top logits for the model’s response when asked for their favourite bird. Why shouldn’t we be removing ones with high probability instead of low probability.
Quite insightful. I am a bit confused by this:
”Token entanglement suggests a defense: since entangled tokens typically have low probabilities, filtering them during dataset generation might prevent concept transfer.”
Can you explain that more?
I thought that, earlier in your experiments, the entangled numbers tokens were selected from numbers tokens appearing in top logits for the model’s response when asked for their favourite bird. Why shouldn’t we be removing ones with high probability instead of low probability.