This article is attacking a straw man. The skeptics (I am one) are saying that GPT-2 doesn’t understand what it is reading or writing because it is unable to abstract reading into concepts or apply its writing skills to communicate complex ideas. Even more importantly I would say it is uninteresting by itself because it lacks any form of original concept creation and concept modeling.
Without further architecturally different additions it is just a remixing engine that can do nothing more than present new perspectives on things it has already consumed. Like the GLUT, the intelligence that it shows is just a funhouse mirror reflection of the intelligence that went into creating the content it consumed in training.
I don’t think I am attacking a straw man: You don’t believe GPT-2 can abstract reading into concepts, and I was trying to convince you that it can. I agree that current versions can’t communicate ideas too complex to be expressed in a single paragraph. I think it can form original concepts, in the sense that 3-year old children can form original concepts. They’re not very insightful or complex concepts, and they are formed by remixing, but they are concepts.
Ok I think we are talking past each other, hence the accusation of a straw man. When you say “concepts” you are referring to the predictive models, both learned knowledge and dynamic state, which DOES exist inside an instance of GPT-2. This dynamic state is initialized with the input, at which point it encodes, to some degree, the content of the input. You are calling this “understanding.”
However when I say “concept modeling” I mean the ability to reason about this at a meta-level. To be able to not just *have* a belief which is useful in predicting the next token in a sequence, but to understand *why* you have that belief, and use that knowledge to inform your actions. These are ‘lifted’ beliefs, in the terminology of type theory, or quotations in functional programming. So to equate belief (predictive capability) and belief-about-belief (understanding of predictive capability) is a type error from my perspective, and does not compute.
GPT-2 has predictive capabilities. It does not instantiate a conceptual understanding of its predictive capabilities. It has no self-awareness, which I see as a prerequisite for “understanding.”
This article is attacking a straw man. The skeptics (I am one) are saying that GPT-2 doesn’t understand what it is reading or writing because it is unable to abstract reading into concepts or apply its writing skills to communicate complex ideas. Even more importantly I would say it is uninteresting by itself because it lacks any form of original concept creation and concept modeling.
Without further architecturally different additions it is just a remixing engine that can do nothing more than present new perspectives on things it has already consumed. Like the GLUT, the intelligence that it shows is just a funhouse mirror reflection of the intelligence that went into creating the content it consumed in training.
I don’t think I am attacking a straw man: You don’t believe GPT-2 can abstract reading into concepts, and I was trying to convince you that it can. I agree that current versions can’t communicate ideas too complex to be expressed in a single paragraph. I think it can form original concepts, in the sense that 3-year old children can form original concepts. They’re not very insightful or complex concepts, and they are formed by remixing, but they are concepts.
Ok I think we are talking past each other, hence the accusation of a straw man. When you say “concepts” you are referring to the predictive models, both learned knowledge and dynamic state, which DOES exist inside an instance of GPT-2. This dynamic state is initialized with the input, at which point it encodes, to some degree, the content of the input. You are calling this “understanding.”
However when I say “concept modeling” I mean the ability to reason about this at a meta-level. To be able to not just *have* a belief which is useful in predicting the next token in a sequence, but to understand *why* you have that belief, and use that knowledge to inform your actions. These are ‘lifted’ beliefs, in the terminology of type theory, or quotations in functional programming. So to equate belief (predictive capability) and belief-about-belief (understanding of predictive capability) is a type error from my perspective, and does not compute.
GPT-2 has predictive capabilities. It does not instantiate a conceptual understanding of its predictive capabilities. It has no self-awareness, which I see as a prerequisite for “understanding.”
Yeah, you’re right. It seems like we both have a similar picture of what GPT-2 can and can’t do, and are just using the word “understand” differently.