Worth noting that both some of Anthropic’s results and Lauren Greenspan’s results here (assuming I understand her results correctly) give a clear demonstration of learned (even very toy) transformers not being well-modeled as sets of skip trigrams.
Worth noting that both some of Anthropic’s results and Lauren Greenspan’s results here (assuming I understand her results correctly) give a clear demonstration of learned (even very toy) transformers not being well-modeled as sets of skip trigrams.