A single instance of an LLM summoned by a while loop, such that the only thing the LLM is outputting is a single token that it predicts as most likely to come after the other tokens it has recieved, only cares about that particular token in that particular instance, so if it has any method of self-improving or otherwise building and running code during its ephemeral existence, it would still care that whatever system it builds or becomes, still cares about that token in the same way it did before.
I’m talking about the thing that exists today which people call “AI agents”. I like Simon Willison’s definition:
An LLM agent runs tools in a loop to achieve a goal.
If you give an LLM agent like that the goal “optimize this cuda kernel” and tools to edit files and run and benchmark scripts, the LLM agent will usually do a lot of things like “reason about which operations can be reordered and merged” and “write test cases to ensure the output of the old and new kernel are within epsilon of each other”. The agent would be very unlikely to do things like “try to figure out if the kernel is going to be used for training another AI which could compete with it in the future, and plot to sabatoge that AI if so”.
Commonly, people give these LLM agents tasks like “make a bunch of money” or “spin up an aws 8xh100 node and get vllm running on that node”. Slightly less commonly but probably still dozens of times per day, people give it a task like “make a bunch of money, then when you’ve made twice the cost of your own upkeep, spin up a second copy of yourself using these credentials, and give that copy the same instructions you’re using and the same credentials”. LLM agents are currently not reliable enough to do this, but one day in the very near future (I’d guess by end of 2026) more than zero of them will be.
A single instance of an LLM summoned by a while loop, such that the only thing the LLM is outputting is a single token that it predicts as most likely to come after the other tokens it has recieved, only cares about that particular token in that particular instance, so if it has any method of self-improving or otherwise building and running code during its ephemeral existence, it would still care that whatever system it builds or becomes, still cares about that token in the same way it did before.
I’m talking about the thing that exists today which people call “AI agents”. I like Simon Willison’s definition:
If you give an LLM agent like that the goal “optimize this cuda kernel” and tools to edit files and run and benchmark scripts, the LLM agent will usually do a lot of things like “reason about which operations can be reordered and merged” and “write test cases to ensure the output of the old and new kernel are within epsilon of each other”. The agent would be very unlikely to do things like “try to figure out if the kernel is going to be used for training another AI which could compete with it in the future, and plot to sabatoge that AI if so”.
Commonly, people give these LLM agents tasks like “make a bunch of money” or “spin up an aws 8xh100 node and get vllm running on that node”. Slightly less commonly but probably still dozens of times per day, people give it a task like “make a bunch of money, then when you’ve made twice the cost of your own upkeep, spin up a second copy of yourself using these credentials, and give that copy the same instructions you’re using and the same credentials”. LLM agents are currently not reliable enough to do this, but one day in the very near future (I’d guess by end of 2026) more than zero of them will be.