neuron has
I was confused by the singular “neuron.”
I think the point here is that if there are some neurons which have low activation but high direct logit attribution after layernorm, then this is pretty good evidence for “smuggling.”
Is my understanding here basically correct?
Yes that’s correct
I was confused by the singular “neuron.”
I think the point here is that if there are some neurons which have low activation but high direct logit attribution after layernorm, then this is pretty good evidence for “smuggling.”
Is my understanding here basically correct?
Yes that’s correct