I feel like there’s some underlying worldview here that GPT-3 either has a theory of mind or it doesn’t, or that GPT-3 is either “doing the theory of mind computations” or it isn’t, and so behavior consistent with theory of mind is compelling evidence for or against theory of mind in general.
Do you also feel this way about various linguistic tasks? Like, does it make sense to say something that scores well on the Winograd schema is “doing anaphora computations”? [This is, of course, a binarization of something that’s actually continuous, and so the continuous interpretation makes more sense.]
Like, I think there’s a thing where one might come into ML thinking confused thoughts that convnets are “recognizing the platonic ideal of cat-ness” and then later having a mechanistic model of how pixels lead to classifications, and here what I am trying to do is figure out what the mechanistic model that replaces the ‘platonic ideal’ looks like here, when it comes to theory-of-mind. (I predict a similar thing is going on for Eliezer.)
Do you also feel this way about various linguistic tasks? Like, does it make sense to say something that scores well on the Winograd schema is “doing anaphora computations”? [This is, of course, a binarization of something that’s actually continuous, and so the continuous interpretation makes more sense.]
Like, I think there’s a thing where one might come into ML thinking confused thoughts that convnets are “recognizing the platonic ideal of cat-ness” and then later having a mechanistic model of how pixels lead to classifications, and here what I am trying to do is figure out what the mechanistic model that replaces the ‘platonic ideal’ looks like here, when it comes to theory-of-mind. (I predict a similar thing is going on for Eliezer.)
I agree the mechanistic thing would be interesting, that does make more sense as an underlying cause of this bounty / thread.