I will try to explain Yann Lecun’s argument against auto-regressive LLMs, which I agree with. The main crux of it is that being extremely superhuman at predicting the next token from the distribution of internet text does not imply the ability to generate sequences of arbitrary length from that distribution.
GPT4′s ability to impressively predict the next token depends very crucially on the tokens in its context window actually belonging to the distribution of internet text written by humans. When you run GPT in sampling mode, every token you sample from it takes it ever so slightly outside the distribution it was trained on. At each new generated token it still assumes that the past 999 tokens were written by humans, but since its actual input was generated partly by itself, as the length of the sequence you wish to predict increases, you take GPT further and further outside of the distribution it knows.
The most salient example of this is when you try to make chatGPT play chess and write chess analysis. At some point, it will make a mistake and write something like “the queen was captured” when in fact the queen was not captured. This is not the kind of mistake that chess books make, so it truly takes it out of distribution. What ends up happening is that GPT conditions its future output on its mistake being correct, which takes it even further outside the distribution of human text, until this diverges into nonsensical moves.
As GPT becomes better, the length of the sequences it can convincingly generate increases, but the probability of a sequence being correct is (1-e)^n, cutting the error rate in half (a truly outstanding feat) merely doubles the length of its coherent sequences.
To solve this problem you would need a very large dataset of mistakes made by LLMs, and their true continuations. You’d need to take all physics books ever written, intersperse them with LLM continuations, then have humans write the corrections to the continuations, like “oh, actually we made a mistake in the last paragraph, here is the correct way to relate pressure to temperature in this problem...”. This dataset is unlikely to ever exist, given that its size would need to be many times bigger than the entire internet.
The conclusion that Lecun comes to: auto-regressive LLMs are doomed.
4 years ago, before I started meditation, I was appropriately skeptical (I still am), not just of the obviously bullshit claims of reincarnations and the weird reverence that people had for the historical buddha, but I was extremely skeptical of the descriptions of advanced states of meditation. So I started meditation with great skepticism in order to calm myself down, which is what the studies showed that it was good for… and then weird shit started happening exactly like the books said it would. Vibrating sensations started happening exactly like the books said they would, I started gaining on-demand access to states of immense happiness, joy and contentment (jhanas), and I started having the exact insights that were predicted. I even started, with extreme skepticism and no small measure of disgust in myself, reading books about chakras, and practicing the exercises they said would “open” my chakras, and again I was supremely surprised to find that, indeed, there were very strong sensations I could feel at specific points along my spine, exactly where the book said the “chakras” ought to be. The books were filled with nonsensical attempts to connect these sensations to phenomena of cosmic significance, but that didn’t make the sensations themselves false.
As far as I am concerned, the books made advance predictions, I made the experiment by practicing the techniques, and found that the predictions were borne out, and updated my beliefs accordingly about the likelihood of advanced states of meditation. Then I started talking in private with people with decades of practice about the claims of reincarnations and weird powers, and came away from those conversations convinced that these people had indeed had something like the experience of a hyper-realistic wakeful dream where they seemed to interact with people they didn’t know, but treated as their family. These practitioners interpreted these as “past lives experiences”, and even though It’s impossible for this frame to actually be true, the experiences themselves appear real.
The problem with using studies of meditation is that they are likely to severely underestimate the long-term potential upside. The money and interest just isn’t there to track people over a decade of intensive meditation practice (which is what would be required to get the large upside).
What I would like to say when trying to recommend meditation to people is ” (in a desperate voice) You fool! You are burning alive and fundamentally confused about why you are suffering, please, for the love of everything good in the universe, walk to the lake right there and extinguish your flames!”. There was a period where normal conversation seemed almost cruel to me, how could I sit here talking about the weather when my interlocutor was in so much suffering, and I knew the solution to their problem?! I quickly realized that my zealousness didn’t convince many people at all, and I found that I could make more people meditate by telling them that they could get a little stress relief from it. I was deceiving them, since I would never spend so much of my own time meditating if stress relief was all I was getting, but this seemed like the utilitarian thing to do. This is all to say that people who have actually achieved some of the more extreme benefits of meditation very very rarely talk about them with normal people, since painful experience has shown this to not be useful at all. Whereas people with negative experiences have no such problems in talking about it.