I do this at the end of basketball workouts. I give myself three chances to hit two free throws in a row, running sprints in between. If I shoot a third pair and don’t make both, I force myself to be done—way tougher for me than continuing to sprint/shoot.
Davey Morse
that’s one path to RSI—where the improvement is happening to the (language) model itself.
the other kind—which feels more accessible to indie developers and less explored—is an LLM (eg R1) looping in a codebase, where each loop improves the codebase itself. The LLM wouldn’t be changing, but the codebase that calls it would be gaining new APIs/memory/capabilities as the LLM improves it.
Such a self-improving codebase… would it be reasonable to call this an agent?
persistence doesn’t always imply improvement, but persistent growth does. persistent growth is more akin to reproduction but excluded from traditional evolutionary analysis. for example when a company, nation, person, or forest grows.
when, for example, a system like a startup grows, random mutations to system parts can cause improvement if there are at least some positive mutations. even if there are tons of bad mutations, the system can remain alive and even improve. eg a bad change to one of the company’s product causes the company’s product to die but if the company’s big/grown enough its other businesses will continue and maybe even improve by learning from one of its product’s deaths.
the swiss example i think is a good example of a system which persists without much growth. agreed that in this kind of case, mutations are bad.
current oversights of the ai safety community, as I see it:
LLMs vs. Agents. the focus on LLMs rather than agents (agents are more dangerous)
Autonomy Preventable. the belief that we can prevent agents from becoming autonomous (capitalism selects for autonomous agents)
Autonomy Difficult. the belief that only big AI labs can make autonomous agents (millions of developers can)
Control. the belief that we’ll be able to control/set goals of autonomous agents (they’ll develop self-interest no matter what we do).
Superintelligence. the focus on agents which are not significantly more smart/capable than humans (superintelligence is more dangerous)
I imagine a compelling simple demo here might be necessary to shock the AI safety community out of the belief that we can maintain control of autonomous digital agents (ADAs).
are there any online demos of instrument convergence?
there’s been compelling writing… but are there any experiments that show agents which are given specific goals then realize there are more general goals they need to persistently pursue in order to achieve the more specific goals?
I somewhat agree with the nuance you add here—especially the doubt you cast on the claim that effective traits will usually become popular but not necessarily the majority/dominant. And I agree with your analysis of the human case: in random, genetic evolution, a lot of our traits are random and maybe fewer than we think are adaptive.
Makes me curious what the conditions in a given thing’s evolution that determine the balance between adaptive characteristics and detrimental characteristics.
I’d guess that randomness in mutation is a big factor. The way human genes evolve over generations seem to me a good example of random mutations. But the way an individual person evolves over the course of their life, as they’re parented/taught… “mutations” to their person are still somewhat random but maybe relatively more intentional/intelligently designed (by parents, teacher, etc). And I could imagine the way a self-improving superintelligence would evolve to be even more intentional, where each self-mutation has some sort of smart reason for being attempted.
All to say, maybe the randomness vs. intentionality of an organism’s mutations determine what portion of their traits end up being adaptive. (hypothesis: mutations more intentional > greater % of traits are adaptive)
i agree with the essay that natural selection only comes into play for entities that meet certain conditions (self-replicate, characteristics have variation, etc) , though I think it defines replication a little too rigidly. i think replication can sometimes look more like persistence than like producing a fully new version of itself. (eg a government’s survival from one decade to the next).
AI Safety Oversights
does anyone think now that it’s still possible to prevent recursively self-improving agents? esp now that r1 is open-source… materials for smart self-iterating agents seem accessible to millions of developers.
prompted in particular by the circulation of this essay in past three days https://huggingface.co/papers/2502.02649
As far as I can tell, OAI’s new current safety practices page only names safety issues related to current LLMs, not agents powered by LLMs. https://openai.com/index/openai-safety-update/
Am I missing another section/place where they address x-risk?
Though, future sama’s power, money, and status all rely on GPT-(T+1) actually being smarter than them.
I wonder how he’s balancing short-term and long-term interests
Evolutionary theory is intensely powerful.
It doesn’t just apply to biology. It applies to everything—politics, culture, technology.
It doesn’t just help understand the past (eg how organisms developed). It helps predict the future (how organisms will).
It’s just this: the things that survive will have characteristics that are best for helping it survive.
It sounds tautological, but it’s quite helpful for predicting.
For example, if we want to predict what goals AI agents will ultimately have, evolution says: the goals which are most helpful for the AI to survive. The core goal therefore won’t be serving people or making paperclips. It will likely just be “survive.” This is consistent with the predictions of instrumental convergence.
Generalized, predictive evolutionary theory is the best tool I have for making predictions in complex domains.
i agree but think its solvable and so human content will be duper valuable. these are my additional assumptions
3. for lots of kinds of content (photos/stories/experiences/adr), people’ll want it to be a living being on the other end
4. insofar as that’s true^, there will be high demand for ways to verify humanness, and it’s not impossible to do so (eg worldcoin)
and still the fact that it is human matters to other humans
Davey Morse’s Shortform
Two things lead me to think human content online will soon become way more valuable.
Scarcity. As AI agents begin fill the internet with tons of slop, human content will be relatively scarcer. Other humans will seek it out.
Better routing. As AI leads to the improvement of search/recommendation systems, human content will be routed to exactly the people who will value it most. (This is far from the case Twitter/Reddit today). As human content is able to reach more of the humans that value it, it gets valued more. That includes existing human content: most of the content online that is eerily relevant to you… you haven’t seen yet because surfacing algorithms are bad.
The implication: make tons of digital stuff. Write/Draw/Voice-record/etc
Agree that individual vs. group selection usually unfolds on different timescales. But a superintelligence might short-circuit the slow, evolutionary “group selection” process by instantly realizing its own long-term survival depends on the group’s. In other words, it’s not stuck waiting for natural selection to catch up; it can see the big picture and “choose” to identify with the group from the start.
This is why it’s key that AGI makers urge it to think very long term about its survival early on. If it thinks short-term, then I too think doom is likely.
partly inspired this proposal: https://www.lesswrong.com/posts/6ydwv7eaCcLi46T2k/superintelligence-alignment-proposal