Yes, I’ve seen that benchmark (I mean, I literally linked to it in my comment) and the video.
Regarding geobench specifically: the main leaderboard on that benchmark is essentially NMPZ (No Moving, Panning or Zooming). Gemini 2.5 Pro achieves an average score of 4085. That’s certainly really good for NMPZ, but I don’t think that’s Rainbolt-tier. Rainbolt-tier is more like 4700-4800, if we want an LLM that has average-case performance equal to Rainbolt’s best-case performance.
Also, LLMs can’t do the “guess the country solely by pavement” thing like he can, so there’s room for improvement.
Yes, I’ve seen that benchmark (I mean, I literally linked to it in my comment) and the video.
Regarding geobench specifically: the main leaderboard on that benchmark is essentially NMPZ (No Moving, Panning or Zooming). Gemini 2.5 Pro achieves an average score of 4085. That’s certainly really good for NMPZ, but I don’t think that’s Rainbolt-tier. Rainbolt-tier is more like 4700-4800, if we want an LLM that has average-case performance equal to Rainbolt’s best-case performance.
Also, LLMs can’t do the “guess the country solely by pavement” thing like he can, so there’s room for improvement.