Cole Wyeth comments on My model of what is going on with LLMs

Cole Wyeth 22 Feb 2025 17:59 UTC
4 points
2
I see, you’re saying that since information flows one step up at each application of MHAttention, no computation path is actually longer than the depth L.
This seems to be right—that means the second paragraph of my comment is wrong, and I “updated too far” towards LLMs being able to think for a long time. But I was also wrong initially since they can at least remember most of their thinking from previous steps—they just don’t get additional time to build on it above the bound L.
- williawa 23 Feb 2025 21:34 UTC
  3 points
  0
  Parent
  Yeah, that’s my understanding as well. Tell me if your understanding changes further in relevant ways.