Experimental results is a more legible and reliable form of evidence than philosophy-level arguments. When it’s available, it’s the reason to start paying attention to the philosophy in a way the philosophy itself isn’t.
Incidentally, hybrid Mamba/MHA doesn’t work significantly better than pure Mamba, at least the way it’s reported in appendix E.2.2 of the paper (beware left/right confusion in Figure 9). The effect is much more visible with Hyena, though the StripedHyena post gives more details on studying hybridization, so it’s unclear if this was studied for Mamba as thoroughly.
Experimental results is a more legible and reliable form of evidence than philosophy-level arguments. When it’s available, it’s the reason to start paying attention to the philosophy in a way the philosophy itself isn’t.
Incidentally, hybrid Mamba/MHA doesn’t work significantly better than pure Mamba, at least the way it’s reported in appendix E.2.2 of the paper (beware left/right confusion in Figure 9). The effect is much more visible with Hyena, though the StripedHyena post gives more details on studying hybridization, so it’s unclear if this was studied for Mamba as thoroughly.