It seems that this model requires a lot of argumentation that is absent from post and only implicit in your comment. Why should I imagine that AGI would have that ability? Are there any examples of very smart humans who simultaneously acquire multiple seemingly magical abilities? If so, and if AGI scales well past human level, it would certainly be quite dangerous. But that seems to assume most of the conclusion.
Explicitly, in the current paradigm this is mostly about training data, though I suppose that with sufficient integration that data will eventually become available.
Anyway, I personally have little doubt that it is possible in principle to build very dangerous AGI. The question is really about the dynamics—how long will it take, how much will it cost, how centralized and agentic will it be, how long are the tails?
It occurs to me that acquiring a few of these “magical” abilities is actually not super useful. I can replicate the helicopter one with a camera and the chess on by consulting an engine. Even if I could do those things secretly, i.e. cheat in chess tournaments, I would not suddenly become god emperor or anything. It actually wouldn’t help me much.
5 isn’t that impressive without further context. And I’ve already said that the el chapo thing is probably more about preexisting connections and resources than intelligence.
Why should I imagine that AGI would have that ability?
Modern LLMs are already like that. They have expert or at least above-average knowledge in many domains simultaneously. They may not have developed “magical” abilities yet, but “AI that has lots of knowledge from a vast number of different domains” is something that we already see. So I think “AI that has more than one magical ability” it’s a pretty straightforward extrapolation.
Btw, I think it’s possible that even before AGI, LLMs will have at least 2 “magical” abilities. They’re getting better at Geoguessr, so we could have a Rainbolt-level LLM in a few years; this seems like the most likely first “magical” ability IMO. Superhuman forecasting could be the next one, especially once LLMs become good at finding relevant news articles in real time. Identifying book authors from a single paragraph with 99% accuracy seems like something LLMs will be able to do (or maybe even already can), though I can’t find a benchmark for that. Accurately guessing age from a short voice sample is something that machine learning algorithms can do, so with enough training data, LLMs could probably do it too.
Yes, I’ve seen that benchmark (I mean, I literally linked to it in my comment) and the video.
Regarding geobench specifically: the main leaderboard on that benchmark is essentially NMPZ (No Moving, Panning or Zooming). Gemini 2.5 Pro achieves an average score of 4085. That’s certainly really good for NMPZ, but I don’t think that’s Rainbolt-tier. Rainbolt-tier is more like 4700-4800, if we want an LLM that has average-case performance equal to Rainbolt’s best-case performance.
Also, LLMs can’t do the “guess the country solely by pavement” thing like he can, so there’s room for improvement.
It seems that this model requires a lot of argumentation that is absent from post and only implicit in your comment. Why should I imagine that AGI would have that ability? Are there any examples of very smart humans who simultaneously acquire multiple seemingly magical abilities? If so, and if AGI scales well past human level, it would certainly be quite dangerous. But that seems to assume most of the conclusion.
Explicitly, in the current paradigm this is mostly about training data, though I suppose that with sufficient integration that data will eventually become available.
Anyway, I personally have little doubt that it is possible in principle to build very dangerous AGI. The question is really about the dynamics—how long will it take, how much will it cost, how centralized and agentic will it be, how long are the tails?
It occurs to me that acquiring a few of these “magical” abilities is actually not super useful. I can replicate the helicopter one with a camera and the chess on by consulting an engine. Even if I could do those things secretly, i.e. cheat in chess tournaments, I would not suddenly become god emperor or anything. It actually wouldn’t help me much.
5 isn’t that impressive without further context. And I’ve already said that the el chapo thing is probably more about preexisting connections and resources than intelligence.
So, I’m cautious of updating on these arguments.
Modern LLMs are already like that. They have expert or at least above-average knowledge in many domains simultaneously. They may not have developed “magical” abilities yet, but “AI that has lots of knowledge from a vast number of different domains” is something that we already see. So I think “AI that has more than one magical ability” it’s a pretty straightforward extrapolation.
Btw, I think it’s possible that even before AGI, LLMs will have at least 2 “magical” abilities. They’re getting better at Geoguessr, so we could have a Rainbolt-level LLM in a few years; this seems like the most likely first “magical” ability IMO.
Superhuman forecasting could be the next one, especially once LLMs become good at finding relevant news articles in real time.
Identifying book authors from a single paragraph with 99% accuracy seems like something LLMs will be able to do (or maybe even already can), though I can’t find a benchmark for that.
Accurately guessing age from a short voice sample is something that machine learning algorithms can do, so with enough training data, LLMs could probably do it too.
I’ll say this much
Rainbolt tier LLMs already exist https://geobench.org/
AI’s trained on Geoguessr are dramatically better than rainbolt and have been for years
Yes, I’ve seen that benchmark (I mean, I literally linked to it in my comment) and the video.
Regarding geobench specifically: the main leaderboard on that benchmark is essentially NMPZ (No Moving, Panning or Zooming). Gemini 2.5 Pro achieves an average score of 4085. That’s certainly really good for NMPZ, but I don’t think that’s Rainbolt-tier. Rainbolt-tier is more like 4700-4800, if we want an LLM that has average-case performance equal to Rainbolt’s best-case performance.
Also, LLMs can’t do the “guess the country solely by pavement” thing like he can, so there’s room for improvement.