Hmm, I think the first bullet point is pretty precisely what I am talking about (though to be clear, I haven’t read the paper in detail).
I was specifically saying that trying to somehow get feedback from future tokens into the next token objective would probably do some interesting things and enable a bunch of cross-token optimization that currently isn’t happening, which would improve performance on some tasks. This seems like what’s going on here.
Agree that another major component of the paper is accelerating inference, which I wasn’t talking about. I would have to read the paper in more detail to get a sense of how much it’s just doing that, in which case I wouldn’t think it’s a good example.
Hmm, I think the first bullet point is pretty precisely what I am talking about (though to be clear, I haven’t read the paper in detail).
I was specifically saying that trying to somehow get feedback from future tokens into the next token objective would probably do some interesting things and enable a bunch of cross-token optimization that currently isn’t happening, which would improve performance on some tasks. This seems like what’s going on here.
Agree that another major component of the paper is accelerating inference, which I wasn’t talking about. I would have to read the paper in more detail to get a sense of how much it’s just doing that, in which case I wouldn’t think it’s a good example.