Dana comments on So how well is Claude playing Pokémon?

Dana 7 Mar 2025 18:09 UTC
10 points
1
But these issues seem far from insurmountable, even with current tech. It is just that they are not actually trying, because they want to limit scaffolding.
From what I’ve seen, the main issues:
1) Poor vision → Can be improved through tool use, will surely improve greatly regardless with new models
2) Poor mapping → Can be improved greatly + straightforwardly through tool use
3) Poor executive function → I feel like this would benefit greatly from something like a separation of concerns. Currently my impression is Claude is getting overwhelmed with context, loses track of what’s going on, then starts messing with its long-term planning. From a clean context, its long-term planning seems fairly decent. Same for loops, I would expect a clean-context Claude could read a summary of recent steps constituting a loop and understand that it is in a loop and that it needs to try something else.
E.g., separate contexts for each of battling, navigation, summarization, long-term planning, coordination, etc.
- Cole Wyeth 7 Mar 2025 18:55 UTC
  6 points
  1
  Parent
  Yes, but because this scaffolding would have to be invented separately for each task, it’s no longer really zero shot and says little about the intelligence of Claude.
  - ozziegooen 7 Mar 2025 21:32 UTC
    6 points
    1
    Parent
    scaffolding would have to be invented separately for each task
    
    Obvious point that we might soon be able to have LLMs code up this necessary scaffolding. This isn’t clearly very far-off, from what I can tell.
  - IC Rainbow 10 Mar 2025 13:41 UTC
    1 point
    2
    Parent
    
    says little about the intelligence of Claude
    
    It says that it lacks intelligence to play zero shot and someone has to compensate the intelligence deficit with an exocortex.
    
    It’s like we can track progress by measuring “performance per exocortex complexity” where the complexity drops from “here’s a bunch of buttons to press in sequence to win” to “”.
    - Cole Wyeth 10 Mar 2025 14:03 UTC
      2 points
      1
      Parent
      Okay, what I meant is “says little in favor of the intelligence of Claude”
  - Dana 7 Mar 2025 21:17 UTC
    1 point
    0
    Parent
    Well, vision and mapping seem like they could be pretty generic (and I expect much better vision in future base models anyway). For the third limitation, I think it’s quite possible that Claude could provide an appropriate segmentation strategy for whatever environment it is told it is being placed into.
    Whether this would be a display of its intelligence, or just its capabilities, is beside the point from my perspective.
    - Cole Wyeth 8 Mar 2025 1:42 UTC
      3 points
      0
      Parent
      This won’t work, happy to bet on it if you want to make a manifold market.