Epistemic status: This is an off-the-cuff question.
~5 years ago there was a lot of exciting progress on game playing through reinforcement learning (RL). Now we have basically switched paradigms, pretraining massive LLMs on ~the internet and then apparently doing some really trivial unsophisticated RL on top of that—this is successful and highly popular because interacting with LLMs is pretty awesome (at least if you haven’t done it before) and they “feel” a lot more like A.G.I. Probably there’s somewhat more commercial use as well via code completion (and some would say many other tasks, personally not really convinced—generative image/video models will certainly be profitable though). There’s also a sense in which they are clearly more general—e.g. one RL algorithm may learn many games but there’s typically an instance per game not one integrated agent. You can just ask an LLM in context to play some games.
However, I’ve been following moderately closely and I can’t seem to think of any examples where LLMs really pushed the state of the art in narrow game playing - how much have LLMs contributed to RL research? For instance, will adding o3 to the stack easily stomp on previous Starcraft / go / chess agents?
Diplomacy AI by Meta is a clear example of how adding LLMs can improve narrow game playing. Most multiplayer games with communication will benefit in the same way.
Yes, after asking the question I realized Diplomacy would be the most likely answer. I don’t find it very satisfying though because it is a text/vibes based game—it wouldn’t have been possible to approach effectively at all without building some kind of chatbot, so it’s exactly the type of game I’d expect LLMs to make progress on even without pushing the frontier on strategy/planning.
In StarCraft II, adding LLMs (to do/aid game-time thinking) will not help the agent in any way, I believe. That happens because inference has a quite large latency, especially as most of prompt changes with all the units moving, so tactical moves are out; strategic questions “what is the other player building” and “how many units do they already have” are better answered by
card-countingcounting visible units and inferring what’s the proportion of remaining resources (or scouting if possible).I guess it is possible that bots’ algorithms are improved with LLMs but that requires a high-quality insight; not convinced that o1 or o3 give such insights.
Ma et al 2023 is relevant here.
That article is suspiciously scarce on what microcontrols units… well, glory to LLMs for decent macro management then! (Though I believe that capability is still easier to get without text neural networks.)