faul_sname comments on Shouldn’t taking over the world be easier than recursively self-improving, as an AI?

faul_sname 2 Nov 2025 21:43 UTC
2 points
0
I’m talking about the thing that exists today which people call “AI agents”. I like Simon Willison’s definition:

An LLM agent runs tools in a loop to achieve a goal.

If you give an LLM agent like that the goal “optimize this cuda kernel” and tools to edit files and run and benchmark scripts, the LLM agent will usually do a lot of things like “reason about which operations can be reordered and merged” and “write test cases to ensure the output of the old and new kernel are within epsilon of each other”. The agent would be very unlikely to do things like “try to figure out if the kernel is going to be used for training another AI which could compete with it in the future, and plot to sabatoge that AI if so”.

Commonly, people give these LLM agents tasks like “make a bunch of money” or “spin up an aws 8xh100 node and get vllm running on that node”. Slightly less commonly but probably still dozens of times per day, people give it a task like “make a bunch of money, then when you’ve made twice the cost of your own upkeep, spin up a second copy of yourself using these credentials, and give that copy the same instructions you’re using and the same credentials”. LLM agents are currently not reliable enough to do this, but one day in the very near future (I’d guess by end of 2026) more than zero of them will be.