I wouldn’t be that surprised if GPT-2 was “only” a System 1. But I also wouldn’t be that surprised if it naturally developed a System 2 when scaled up, and given more training. I also wouldn’t be that surprised if it turned out not to need a System 2.
As steve2152 also noted, System 2 (or more accurately, Type 2) reasoning involves passing the outputs from one Type 1 system to another using working memory resources. Working memory seems to involve several specialized components, including memory storages and executive functions that control and regulate it. If GPT-2 doesn’t have those kinds of architectural properties already, it’s not going to develop them by just having more training data thrown at it.
Something I notice here (about myself) is that I don’t currently understand enough about what’s going on under-the-hood to make predictions about what sort of subsystems GPT could develop internally, and what it couldn’t. (i.e. if my strength as a rationalist is the ability to be more confused by fiction than reality, well, alas)
It seems like it has to develop internal models in order to make predictions. It makes plausible sense to me that working memory is a different beast that you can’t develop by having more training data thrown at you, but I don’t really know what facts about GPT’s architecture should constrain my beliefs about that.
(It does seem fairly understandable to me that, even if it were hypothetically possible for GPT to invent working memory, it would be an inefficient way of inventing working memory)
As steve2152 also noted, System 2 (or more accurately, Type 2) reasoning involves passing the outputs from one Type 1 system to another using working memory resources. Working memory seems to involve several specialized components, including memory storages and executive functions that control and regulate it. If GPT-2 doesn’t have those kinds of architectural properties already, it’s not going to develop them by just having more training data thrown at it.
Something I notice here (about myself) is that I don’t currently understand enough about what’s going on under-the-hood to make predictions about what sort of subsystems GPT could develop internally, and what it couldn’t. (i.e. if my strength as a rationalist is the ability to be more confused by fiction than reality, well, alas)
It seems like it has to develop internal models in order to make predictions. It makes plausible sense to me that working memory is a different beast that you can’t develop by having more training data thrown at you, but I don’t really know what facts about GPT’s architecture should constrain my beliefs about that.
(It does seem fairly understandable to me that, even if it were hypothetically possible for GPT to invent working memory, it would be an inefficient way of inventing working memory)