Yes, I’ve seen that benchmark (I mean, I literally linked to it in my comment) and the video.
Regarding geobench specifically: the main leaderboard on that benchmark is essentially NMPZ (No Moving, Panning or Zooming). Gemini 2.5 Pro achieves an average score of 4085. That’s certainly really good for NMPZ, but I don’t think that’s Rainbolt-tier. Rainbolt-tier is more like 4700-4800, if we want an LLM that has average-case performance equal to Rainbolt’s best-case performance.
Also, LLMs can’t do the “guess the country solely by pavement” thing like he can, so there’s room for improvement.
I’ll say this much
Rainbolt tier LLMs already exist https://geobench.org/
AI’s trained on Geoguessr are dramatically better than rainbolt and have been for years
Yes, I’ve seen that benchmark (I mean, I literally linked to it in my comment) and the video.
Regarding geobench specifically: the main leaderboard on that benchmark is essentially NMPZ (No Moving, Panning or Zooming). Gemini 2.5 Pro achieves an average score of 4085. That’s certainly really good for NMPZ, but I don’t think that’s Rainbolt-tier. Rainbolt-tier is more like 4700-4800, if we want an LLM that has average-case performance equal to Rainbolt’s best-case performance.
Also, LLMs can’t do the “guess the country solely by pavement” thing like he can, so there’s room for improvement.