But these issues seem far from insurmountable, even with current tech. It is just that they are not actually trying, because they want to limit scaffolding.
From what I’ve seen, the main issues: 1) Poor vision → Can be improved through tool use, will surely improve greatly regardless with new models 2) Poor mapping → Can be improved greatly + straightforwardly through tool use 3) Poor executive function → I feel like this would benefit greatly from something like a separation of concerns. Currently my impression is Claude is getting overwhelmed with context, loses track of what’s going on, then starts messing with its long-term planning. From a clean context, its long-term planning seems fairly decent. Same for loops, I would expect a clean-context Claude could read a summary of recent steps constituting a loop and understand that it is in a loop and that it needs to try something else. E.g., separate contexts for each of battling, navigation, summarization, long-term planning, coordination, etc.
Yes, but because this scaffolding would have to be invented separately for each task, it’s no longer really zero shot and says little about the intelligence of Claude.
It says that it lacks intelligence to play zero shot and someone has to compensate the intelligence deficit with an exocortex.
It’s like we can track progress by measuring “performance per exocortex complexity” where the complexity drops from “here’s a bunch of buttons to press in sequence to win” to “”.
Well, vision and mapping seem like they could be pretty generic (and I expect much better vision in future base models anyway). For the third limitation, I think it’s quite possible that Claude could provide an appropriate segmentation strategy for whatever environment it is told it is being placed into.
Whether this would be a display of its intelligence, or just its capabilities, is beside the point from my perspective.
But these issues seem far from insurmountable, even with current tech. It is just that they are not actually trying, because they want to limit scaffolding.
From what I’ve seen, the main issues:
1) Poor vision → Can be improved through tool use, will surely improve greatly regardless with new models
2) Poor mapping → Can be improved greatly + straightforwardly through tool use
3) Poor executive function → I feel like this would benefit greatly from something like a separation of concerns. Currently my impression is Claude is getting overwhelmed with context, loses track of what’s going on, then starts messing with its long-term planning. From a clean context, its long-term planning seems fairly decent. Same for loops, I would expect a clean-context Claude could read a summary of recent steps constituting a loop and understand that it is in a loop and that it needs to try something else.
E.g., separate contexts for each of battling, navigation, summarization, long-term planning, coordination, etc.
Yes, but because this scaffolding would have to be invented separately for each task, it’s no longer really zero shot and says little about the intelligence of Claude.
Obvious point that we might soon be able to have LLMs code up this necessary scaffolding. This isn’t clearly very far-off, from what I can tell.
It says that it lacks intelligence to play zero shot and someone has to compensate the intelligence deficit with an exocortex.
It’s like we can track progress by measuring “performance per exocortex complexity” where the complexity drops from “here’s a bunch of buttons to press in sequence to win” to “”.
Okay, what I meant is “says little in favor of the intelligence of Claude”
Well, vision and mapping seem like they could be pretty generic (and I expect much better vision in future base models anyway). For the third limitation, I think it’s quite possible that Claude could provide an appropriate segmentation strategy for whatever environment it is told it is being placed into.
Whether this would be a display of its intelligence, or just its capabilities, is beside the point from my perspective.
This won’t work, happy to bet on it if you want to make a manifold market.