Evan R. Murphy comments on Capabilities and alignment of LLM cognitive architectures

Evan R. Murphy 24 Apr 2023 17:05 UTC
3 points
0
Post summary (experimental)
Here’s an alternative summary of your post, complementing your TL;DR and Overview. This is generated by my summarizer script utilizing gpt-3.5-turbo and gpt-4. (Feedback welcome!)
The article explores the potential of language model cognitive architectures (LMCAs) to enhance large language models (LLMs) and accelerate progress towards artificial general intelligence (AGI). LMCAs integrate and expand upon approaches from AutoGPT, HuggingGPT, Reflexion, and BabyAGI, adding goal-directed agency, executive function, episodic memory, and sensory processing to LLMs. The author contends that these cognitive capacities will enable LMCAs to perform extensive, iterative, goal-directed “thinking” that incorporates topic-relevant web searches, thus increasing their effective intelligence.
While the acceleration of AGI timelines may be a downside, the author suggests that the natural language alignment (NLA) approach of LMCAs, which reason about and balance ethical goals similarly to humans, offers significant benefits compared to existing AGI and alignment approaches. The author also highlights the strong economic incentives for LMCAs, as computational costs are low for cutting-edge innovation, and individuals, small and large businesses are likely to contribute to progress. However, the author acknowledges potential difficulties and deviations in the development of LMCAs.
The article emphasizes the benefits of incorporating episodic memory into language models, particularly for decision-making and problem-solving. Episodic memory enables the recall of past experiences and strategies, while executive function focuses attention on relevant aspects of the current problem. The interaction between these cognitive processes can enhance social cognition, self-awareness, creativity, and innovation. The article also addresses the limitations of current episodic memory implementations in language models, which are limited to text files. However, it suggests that vector space search for episodic memory is possible, and language can encode multimodal information. The potential for language models to call external software tools, providing access to nonhuman cognitive abilities, is also discussed.
The article concludes by examining the implications of the NLA approach for alignment, corrigibility, and interpretability. Although not a complete solution for alignment, it is compatible with a hodgepodge alignment strategy and could offer a solid foundation for self-stabilizing alignment. The author also discusses the potential societal alignment problem arising from the development of LLMs with access to powerful open-source agents. While acknowledging LLMs’ potential benefits, the author argues for planning against Moloch (a metaphorical entity representing forces opposing collective good) and accounting for malicious and careless actors. Top-level alignment goals should emphasize corrigibility, interpretability, harm reduction, and human empowerment/flourishing. The author also raises concerns about the unknown mechanisms of LLMs and the possibility of their output becoming deceptively different from the internal processing that generates it. The term x-risk AI (XRAI) is proposed to denote AI with a high likelihood of ending humanity. The author also discusses the principles of executive function and their relevance to LLMs, the importance of dopamine response in value estimation, and the challenges of ensuring corrigibility and interpretability in LMCA goals. In conclusion, the author suggests that while LLM development presents a wild ride, there is a fighting chance to address the potential societal alignment problem.
I may follow up with an object-level comment on your post, as I’m finding it super interesting but still digesting the content. (I am actually reading it and not just consuming this programmatic summary :)
- Seth Herd 24 Apr 2023 22:50 UTC
  4 points
  0
  Parent
  Cool, thanks! I think this summary is impressive. I think it’s missing a major point in the last paragraph: the immense upside of the natural language alignment and interpretability possible in LMCAs. However, that summary is in keeping with the bulk of what I wrote, and a human would be at risk of walking away with the same misunderstanding.