I found this really useful, particularly the start. I don’t think it matters very much how fast AGI progresses after it’s exceeded human intelligence by a significant amount, because we will have lost control of the world to it the first time an AGI can outsmart the relevant humans. Slow progression in the parahuman range does make a difference, though—it makes the outcome of that battle of wits less certain, and may offer us a warning shot of dangerous AGI behavior.
This approach can also make the LLM agentic. The standard starting prompt is “create a plan that accomplishes X”. That’s scary, but also an opportunity to include alignment goals in natural language. I think this is our most likely route to AGI and also our best shot at successful alignment (of approaches proposed so far).
I agree that, in the context of an agent built from an LLM and scaffolding (such as memory and critic systems), the LLM is analogous to the human System 1. But in general, LLMs capability profiles are rather different than those of humans humans (for example, no human is as well-read as even GPT-3.5, while LLMs have specific deficiencies that we don’t, for example around counting, character/word representations of text, instruction following, and so forth). So the detailed “System 1” capabilities of such an architecture might not look much like human System 1 capabilities — especially if the LLM was dramatically larger than current ones. For example, for a sufficiently large LLM trained using current techniques I’d expect “produce flawless well-edited text, at a quality that would take a team of typical humans days or weeks” to be a very rapid System 1 activity.
I found this really useful, particularly the start. I don’t think it matters very much how fast AGI progresses after it’s exceeded human intelligence by a significant amount, because we will have lost control of the world to it the first time an AGI can outsmart the relevant humans. Slow progression in the parahuman range does make a difference, though—it makes the outcome of that battle of wits less certain, and may offer us a warning shot of dangerous AGI behavior.
I think there are other routes to improving LLM performance by scaffolding them with other tools and cognitive algorithms. LLMs are like a highly capable System 1 in humans—they’re what we’d say on first thought. But we improve our thinking by internal questioning, when the answer is important. I’ve written about this in Capabilities and alignment of LLM cognitive architectures https://www.lesswrong.com/posts/ogHr8SvGqg9pW5wsT/capabilities-and-alignment-of-llm-cognitive-architectures. I’m very curious what you think about this route to improving LLMs effective intelligence.
This approach can also make the LLM agentic. The standard starting prompt is “create a plan that accomplishes X”. That’s scary, but also an opportunity to include alignment goals in natural language. I think this is our most likely route to AGI and also our best shot at successful alignment (of approaches proposed so far).
I agree that, in the context of an agent built from an LLM and scaffolding (such as memory and critic systems), the LLM is analogous to the human System 1. But in general, LLMs capability profiles are rather different than those of humans humans (for example, no human is as well-read as even GPT-3.5, while LLMs have specific deficiencies that we don’t, for example around counting, character/word representations of text, instruction following, and so forth). So the detailed “System 1” capabilities of such an architecture might not look much like human System 1 capabilities — especially if the LLM was dramatically larger than current ones. For example, for a sufficiently large LLM trained using current techniques I’d expect “produce flawless well-edited text, at a quality that would take a team of typical humans days or weeks” to be a very rapid System 1 activity.