lc comments on On Dwarkesh Patel’s Podcast With Richard Sutton

lc 30 Sep 2025 15:33 UTC
8 points
8
Obviously the training data of LLMs contains more than human dialogue, so the claim that the pretrained LLMs are “strictly imitating humans” is clearly false. I don’t know why this was never brought up.
- David Davidson 2 Oct 2025 12:59 UTC
  1 point
  0
  Parent
  It’s neither obvious nor clear to me. Who wrote the rest of their training data, besides us oh-so-fallible humans? What percentage of the data does this non-human authorship constitute?