[Question] Is the work on AI alignment relevant to GPT?

I see a lot of posts go by here on AI al­ign­ment, agent foun­da­tions, and so on, and I’ve seen var­i­ous pa­pers from MIRI or on arXiv. I don’t fol­low the sub­ject in any depth, but I am notic­ing a strik­ing dis­con­nect be­tween the con­cepts ap­pear­ing in those dis­cus­sions and re­cent ad­vances in AI, es­pe­cially GPT-3.

Peo­ple talk a lot about an AI’s goals, its util­ity func­tion, its ca­pa­bil­ity to be de­cep­tive, its abil­ity to simu­late you so it can get out of a box, ways of mo­ti­vat­ing it to be be­nign, Tool AI, Or­a­cle AI, and so on. Some of that is just spec­u­la­tive talk, but there does ap­pear to be real math­e­mat­ics go­ing on, for ex­am­ple on em­bed­ded agency. But when I look at GPT-3, even though this is already an AI that Eliezer finds alarm­ing, I see none of these things. GPT-3 is a huge model, trained on huge data, for pre­dict­ing text. That is not to say that it can­not be un­der­stood in cog­ni­tive terms, but I see no rea­son to ex­pect it to be. It is at least some­thing that would have to be demon­strated be­fore any of the for­mal­ised work on AI safety would be rele­vant.

Peo­ple spec­u­late that big­ger and bet­ter ver­sions of GPT-like sys­tems may give us some level of real AGI. Can sys­tems of this sort be in­ter­preted as hav­ing goals, in­ten­tions, or any of the other cog­ni­tive and log­i­cal con­cepts that the AI dis­cus­sions are pred­i­cated on?