Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Max Ma comments on
Transformer Attention’s High School Math Mistake
Max Ma
23 Mar 2025 23:05 UTC
1
point
0
DeepSeek V3 mitigated this mistake unknowingly. In their MLA, K, V shares the same nn.linear.
Back to top
DeepSeek V3 mitigated this mistake unknowingly. In their MLA, K, V shares the same nn.linear.