Filip Sondej comments on Testing which LLM architectures can do hidden serial reasoning

Filip Sondej 17 Dec 2024 19:21 UTC
3 points
0
Yup, here is such a plot, made after training “switcher” architecture for 350k examples. I remember it was similar for the longer training—a few longest task lengths struggle, but the rest is near 100%.