1a3orn made what I thought was a good reply in a DM. My digestion/riff would be “You’re implicitly assuming that planning will be especially cheap for an LLM agent compared to humans. But even medium term planning / resilience to plans going awry is a particular weakness for LLM agents. The first, weakest agents that could pose a risk to humanity might be differentially bad at planning, find it differentially expensive to make good plans, do differentially less planning.”
Yeah, when I think about “AI takeover”, I am imagining a very strong a smart AI, the one for which the success is more plausible. But before we get strong AIs, we will have weak AIs, so the first takeover attempts will be made by them. Maybe even the first successful takeover attempt.
A very strong and smart AI would however do thousand different things at the same time. Unlike the chess bot, which only plays on one chessboard, the AI could e.g. have separate plans to taking over each specific country. Many plans to take over one specific country would not interfere with plans to take over another country. And if you take over one, you can start building secret underground datacenters and killbot factories there. Although there will still be the option to sacrifice one plan in order to help another plan, for example you might help a stronger country destroy the datacenters in a weaker country, if doing so helps you infiltrate the stronger country, and later build better datacenters there.
Even taking over a country can involve hundred plans running in parallel: infiltrating various groups and trying to take leadership of them, and only afterwards trying to give more power to those which were successfully infiltrated. Even within the group, each candidate for a leader could have a different mysterious friend on the phone helping them defeat their competitors; where all mysterious friends happen to be the same AI. With a sufficiently large capacity, the AI could try to subvert every individual human. While trying to hack all existing systems, etc.
Isn’t necessarily so. https://arxiv.org/abs/2509.03581
1a3orn made what I thought was a good reply in a DM. My digestion/riff would be “You’re implicitly assuming that planning will be especially cheap for an LLM agent compared to humans. But even medium term planning / resilience to plans going awry is a particular weakness for LLM agents. The first, weakest agents that could pose a risk to humanity might be differentially bad at planning, find it differentially expensive to make good plans, do differentially less planning.”
Yeah, when I think about “AI takeover”, I am imagining a very strong a smart AI, the one for which the success is more plausible. But before we get strong AIs, we will have weak AIs, so the first takeover attempts will be made by them. Maybe even the first successful takeover attempt.
A very strong and smart AI would however do thousand different things at the same time. Unlike the chess bot, which only plays on one chessboard, the AI could e.g. have separate plans to taking over each specific country. Many plans to take over one specific country would not interfere with plans to take over another country. And if you take over one, you can start building secret underground datacenters and killbot factories there. Although there will still be the option to sacrifice one plan in order to help another plan, for example you might help a stronger country destroy the datacenters in a weaker country, if doing so helps you infiltrate the stronger country, and later build better datacenters there.
Even taking over a country can involve hundred plans running in parallel: infiltrating various groups and trying to take leadership of them, and only afterwards trying to give more power to those which were successfully infiltrated. Even within the group, each candidate for a leader could have a different mysterious friend on the phone helping them defeat their competitors; where all mysterious friends happen to be the same AI. With a sufficiently large capacity, the AI could try to subvert every individual human. While trying to hack all existing systems, etc.