RSS

cmathw

Karma: 69

Gated At­ten­tion Blocks: Pre­limi­nary Progress to­ward Re­mov­ing At­ten­tion Head Superposition

8 Apr 2024 11:14 UTC
33 points
3 comments15 min readLW link

Poly­se­man­tic At­ten­tion Head in a 4-Layer Transformer

9 Nov 2023 16:16 UTC
46 points
0 comments6 min readLW link