Charlie Steiner comments on Operationalizing two tasks in Gary Marcus’s AGI challenge

Charlie Steiner 10 Jun 2022 16:17 UTC
3 points
0
Nice examples, thanks.
However, I suspect that for fairness you’d actually want to avoid classics, to avoid leakage of human opinions about the subject matter into the training data (if such a data corpus exists, which seems likely). Doing the exercise with media released in the last week would sidestep the issue.
- Bill Benzon 12 Jun 2022 13:42 UTC
  1 point
  0
  Parent
  Well, maybe. After all, we humans talk about books and movies and influence one another’s opinions. Not sure it would be a bad thing for an AI to see how it’s done.
  
  I know Project Gutenberg has loads of full texts of literary classics and not-so-classics available online. I have no idea whether or not those are scraped in the process of putting a corpus together.