My point is more that we have millennia of experience building tools and social structures for making humans able to successfully accomplish tasks, and maybe 2 years of experience building tools and structures for making LLM agents able to successfully accomplish tasks.
I do agree that there’s some difference in generality, but I expect that if we had spent millennia gathering experience building tools and structures tailored towards making LLMs more effective, the generality failures of LLMs would look a lot less crippling.
If you take a bunch of LLMs and try to get them to collaboratively build a 1GW power plant, they are going to fail mostly in ways like
they have hilariously poor vision
they don’t make effective use of new tools
they don’t create new tools to trivialize repetitive tasks
they get caught in loops of trying the same ineffective thing over and over
All of these are failure modes which can be substantially mitigated by better scaffolding of the sort that is hard to design in one shot but easy to iteratively improve over time.
My point is more that we have millennia of experience building tools and social structures for making humans able to successfully accomplish tasks, and maybe 2 years of experience building tools and structures for making LLM agents able to successfully accomplish tasks.
I do agree that there’s some difference in generality, but I expect that if we had spent millennia gathering experience building tools and structures tailored towards making LLMs more effective, the generality failures of LLMs would look a lot less crippling.
If you take a bunch of LLMs and try to get them to collaboratively build a 1GW power plant, they are going to fail mostly in ways like
they have hilariously poor vision
they don’t make effective use of new tools
they don’t create new tools to trivialize repetitive tasks
they get caught in loops of trying the same ineffective thing over and over
All of these are failure modes which can be substantially mitigated by better scaffolding of the sort that is hard to design in one shot but easy to iteratively improve over time.