Mitchell_Porter comments on AI misalignment risk from GPT-like systems?

Mitchell_Porter 20 Jun 2022 1:25 UTC
5 points
0
One of the holy grails of AI has been “common sense knowledge”—the kind of comprehensive general knowledge about the concrete everyday world, that humans begin to acquire when just a few years old, and which we then keep refining throughout our lives. Before the large language models, the only halfway successful approach to this was Cyc, and they dealt with the problem by simply spoonfeeding their AI with tens of thousands of everyday propositions, laboriously added to its knowledge base by hand.
But as we have discovered, large language models, designed simply to learn and imitate patterns in very large collections of Internet text, can do a surprisingly good job of talking as if they are a person with a typical person’s knowledge. There seems to be very little understanding of how they do this. But let’s postulate that what they develop, are “chatbot schemas”, conversational agents which roughly mimic the internal changes of state in a thinking and communicating human being, along with fragments of knowledge that can be drawn upon by the “chatbots”.
A language model, then, is a kind of mirror held to the corpus of human writings, a mirror of sufficient fineness that it reveals some of the cognitive and conceptual structure implicit within those writings. But also an enchanted mirror that we can talk to, that summons persons and places that never existed, but which are fashioned according to the logic it has discerned in our productions.
Left to itself, the language model is passive, and random when it responds. But having discovered a learning process sufficiently deep and general that it cheaply produces imitations of agents with common-sense knowledge, the human race is now trying to harness that power, refine it, make it more predictable, turn it into part of a true AGI. In my opinion, that’s where the sharpest dangers lie: not that a coherently malevolent agent will spontaneously crystallize inside a straightforward language model, but that a language model, reshaped and trained to be a dutiful part of a larger cognitive architecture, will also be part of what pushes that larger “mind” beyond human understanding or control.