Over the past few days I’ve been doing a lit review of the different types of attention heads people have found and/or the metrics one can use to detect the presence of those types of heads.
Here is a rough list from my notes, sorry for the poor formatting, but I did say its rough!
Over the past few days I’ve been doing a lit review of the different types of attention heads people have found and/or the metrics one can use to detect the presence of those types of heads.
Here is a rough list from my notes, sorry for the poor formatting, but I did say its rough!
Bigram entropy
positional embedding ablation
prev token attention
prefix token attention
ICL score
comp scores
multigram analysis
duplicate token score
induction head score
succession score
copy surpression heads
long vs short prefix induction head differentiation
induction head specializations
literal copying head
translation
pattern matching
copying score
anti-induction heads
S-inhibition heads
Name mover heads
Negative name mover heads
Backup name mover heads
(I don’t entirely trust this paper) Letter mover heads
(possibly too specific to be useful) year identification heads
also MLPs which id which years are greater than the selected year
(I don’t entirely trust this paper) queried rule locating head
(I don’t entirely trust this paper) queried rule mover head
(I don’t entirely trust this paper) “fact processing” head
(I don’t entirely trust this paper) “decision” head
(possibly too specific) subject heads
(possibly too specific) relation heads
(possibly too specific) mixed subject and relation heads