jessicata comments on 2023 in AI predictions

jessicata 2 Jan 2024 0:35 UTC
22 points
3
Beat Ocarina of Time with <100 hours of playing Zelda games during training or deployment (but perhaps training on other games), no reading guides/walkthroughs/playthroughs, no severe bug exploits (those that would cut down the required time by a lot), no reward-shaping/advice specific to this game generated by humans who know non-trivial things about the game (but the agent can shape its own reward). Including LLM coding a program to do it. I’d say probably not by 2033.
What links here?
- AI #45: To Be Determined by Zvi (4 Jan 2024 15:00 UTC; 52 points)
- paulfchristiano 2 Jan 2024 17:34 UTC
  6 points
  0
  Parent
  It seems fairly unlikely that this specific task will be completed soon for a variety of reasons: it sounds like it technically requires training a new LM that removes all data about zelda games; it involves a fair amount of videogame-specific engineering hassle; and it’s far from anything with obvious economic relevance + games are out of fashion (not because they are too hard). I do still think it will be done before 2033.
  If we could find a similar task that was less out of the way then I’d probably be willing to bet on it happening much sooner. Presumably this is an analogy to something that would be relevant for AI systems automating R&D and is therefore closer to what people are interested in doing with LMs.
  Although we can’t bet on it, I do think that if AI developers made a serious engineering effort on the zelda task right now then they would have a reasonable chance of success within 2 years (I’d wildly guess 25%), and this will rise over time. I think GPT-4 with vision will do a reasonable job of identifying the next step needed to complete the game, and models trained with RL to follow instructions in video games across a broad variety of games (including 3d games with similar controls and perspective to Zelda) would likely be competent enough to solve most of the subtasks if you really went all out on it.
  I don’t have a good sense of what part you think is hard. I’d guess that the most technically uncertain part is training an RL policy that takes a description of a local task (e.g. “throw a bomb so that it explodes next to the monster’s eye”) and then actually executing it. But my sense is that you might be more concerned about high-level planning.
  - jessicata 2 Jan 2024 21:45 UTC
    2 points
    0
    Parent
    I think it’s hard because it requires some planning and puzzle solving in a new, somewhat complex environment. The AI results on Montezuma’s Revenge seem pretty unimpressive to me because they’re going to a new room, trying random stuff until they make progress, then “remembering” that for future runs. Which means they need quite a lot of training data.
    
    For short term RL given lots of feedback, there are already decent results e.g. in starcraft and DOTA. So the difficulty is more figuring out how to automatically scope out narrow RL problems that can be learned without too much training time.