Playing complex games requires shallow thinking and deep thinking. Of course the more brute forcing you can do, you’ll do better on any game.
The hypothesis was that they wouldn’t have improved as much on complex games as on simple, more brute-forceable games, which was mildly supported by the data.
It seems to me like the llms are indeed improving on complex games, which goes against your hypothesis?
Playing complex games requires shallow thinking and deep thinking. Of course the more brute forcing you can do, you’ll do better on any game.
The hypothesis was that they wouldn’t have improved as much on complex games as on simple, more brute-forceable games, which was mildly supported by the data.