Davidmanheim comments on The Stochastic Parrot Hypothesis is debatable for the last generation of LLMs

Davidmanheim 7 Nov 2023 19:19 UTC
6 points
2
One concern I have is that there are many claims here about what was or was not present in the training data. We don’t know what training data GPT-4 used, and it’s very plausible that, for instance, lots of things that GPT-3 and GPT-3.5 were asked were used in training, perhaps even with custom, human written answers. (You did mention that you don’t know exactly what it was trained on, but there’s still an implicit reliance. So mostly I’m just annoyed that OpenAI isn’t even open about the things that don’t pose any plausible risks, such as what they train on.)
And this is not to say I disagree—I think the post is correct. I just worry that many of the claims aren’t necessarily possibly to justify.
- Noosphere89 11 Feb 2025 21:13 UTC
  4 points
  0
  Parent
  More and more, I’m updating towards that a non-trivial (perhaps almost all) of GPT-4′s capability is downstream of being able to store the ~entire internet in it’s mind, which trivializes the problem.
  
  That doesn’t mean it’s a stochastic parrot, or not intelligent, but it does have implications for a potential future, if AI pretraining was only scaled:
  
  https://x.com/DimitrisPapail/status/1888325914603516214
  
  https://www.lesswrong.com/posts/i7JSL5awGFcSRhyGF/shortform-2#s6xSyKkDLgpcD9wPw
- Quentin FEUILLADE--MONTIXI 7 Nov 2023 20:55 UTC
  1 point
  −1
  Parent
  I agree. However, I doubt that the examples from argument 4 are in the training, I think this is the strongest argument. The different scenario came out of my mind and I didn’t find any study / similar topic research with the same criteria as in the appendix (I didn’t search a lot though).
  - Davidmanheim 9 Nov 2023 14:10 UTC
    2 points
    0
    Parent
    I agree that, tautologically, there is some implicit model that enables the LLM to infer what will happen in the case of the ball. I also think that there is a reasonably strong argument that whatever this model it, it in some way maps to “understanding of causes”—but also think that there’s an argument the other way, that any map between the implicit associations and reality is so convoluted that almost all of the complexity is contained within our understanding of how language maps to the world. This is a direct analog of Aaronson’s “Waterfall Argument”—and the issue is that there’s certainly lots of complexity in the model, but we don’t know how complex the map between the model and reality is—and because it routes through human language, the stochastic parrot argument is, I think, that the understanding is mostly contained in the way humans perceive language.