ryan_greenblatt comments on The Stochastic Parrot Hypothesis is debatable for the last generation of LLMs

ryan_greenblatt 11 Mar 2024 22:31 UTC
4 points
0

I assume you mean ‘won’t generalize to answering questions about both modalities’, and that’s false.

Oops, my wording was confusing. I was imagining something like having a transformer which can take in both text tokens and image tokens (patches), but each training sequence is either only images or only text. (Let’s also suppose we strip text out of images for simplicity.)

Then, we generalize to a context which has both images and text and ask the model “How many dogs are in the image?”