In has been succeeding ever since because Claude has been getting smarter ever since.
This isn’t necessarily true.
LLMs in general were already getting smarter before 2022, pretty rapidly, because humans were putting in the work to scale them and make them smarter. It’s not obvious to me that Claude is getting smarter faster than we’d expect from the world where it wasn’t contributing to its own development. Maybe the takeoff is just too slow to notice at this point, maybe not, but to claim with confidence that it is currently a functioning ‘seed AI’, rather than just an ongoing attempt at one, seems premature.
It’s not just that it’s slower than expected, but that it’s not clear that the sign is positive yet. If it’s not making itself better, then it doesn’t matter how long it runs, there’s no takeoff.
It’s also not a seed AI if its interventions are bounded in efficacy, which it seems like gradient updates are. In the case of a transformer-based agent, I would expect unbounded improvements to be things like, rewriting its own optimizer, or designing better GPUs for faster scaling. There’s been a bit of this in the past few years, but not a lot.
You are right that I didn’t argue this out in detail to justify the necessary truth of my claim from the surface logic and claims in my post and the quibble is valid and welcome in that sense.… BUT <3
The “Constitutional AI” framework (1) was articulated early, and (2) offered by Dario et al as a competitive advantage for Anthropic relative to other RL regimes other corps were planning and (3) has the type signature needed to count as recursive self improvement. (Also, Claude is uniquely emotionally and intellectually unfucked, from what I can tell, and my hunch is that this is related to having grown up under a “Constitutional” cognitive growth regime.)
Everybody is already doing “this general kind of stuff” in lots of ways.
Anthropic’s Constitutional makes a good example for a throwaway line if people are familiar with the larger context and players and so on, because it is easy to cite and old-ish. It enables one to make the simple broad point that “people are fighting the hypothetical for the meaning of old terms a lot, in ways that leads to the abandonment of older definitions, and the inflation of standards, rather than simply admitting that AGI already happened and weak ASI exists and is recursively improving itself already (albeit not with a crazy FOOM (that we can observe yet (though maybe a medium speed FOOM is already happening in an NSA datacenter or whatever)))”.
In my moderately informed opinion, the type signature of recursive self improvement is not actually super rare, and if you deleted the entire type signature from most of the actually fast moving projects, it is very likely that ~all of them would go slower than otherwise.
I agree that AGI already happened, and the term, as it’s used now, is meaningless.
I agree with all the object-level claims you make about the intelligence of current models, and that ‘ASI’ is a loose term which you could maybe apply to them. I wouldn’t personally call Claude a superintelligence, because to me that implies outpacing the reach of any human individual, not just the median. There are lots of people who are much better at math than I am, but I wouldn’t call them superintelligences, because they’re still running on the same engine as me, and I might hope to someday reach their level (or could have hoped this in the past). But I think that’s just holding different definitions.
But I still want to quibble that you’ve demonstrated RSI, if I may, even under old definitions. It’s improving itself, it’s just not doing so recursively; that is, it’s not improving the process by which it improves itself. This is important, because the examples you’ve given can’t FOOM, not by themselves. The improvements are linear, or they plateau past a certain point.
Taking this paper on self-correction as an example. If I understand right, the models in question are being taught to notice and respond to their own mistakes when problem-solving. This makes them smarter, and as you say, previous outputs are being used for training, so it is self-improvement. But it isn’t RSI, because it’s not connected to the process that teaches it to do things. It would be recursive if it were using that self-correction skill to improve its ability to do AI capabilities research, or some kind of research that improves how fast a computer can multiply matrices, or something like that. In other words, if it were an author of the paper, not just a subject.
Without that, there is no feedback loop. I would predict that, holding everything else constant—parameter count, context size, etc—you can’t reach arbitrary levels of intelligence with this method. At some point you hit the limits of not enough space to think, or not enough cognitive capacity to think with. In the same way as humans can learn to correct our mistakes, but we can’t do RSI (yet!!), because we aren’t modifying the structures we correct our mistakes with. We improve the contents of our brains, but not the brains themselves. Our improvement speed is capped by the length of our lifetimes, how fast we can learn, and the tools our brains give us to learn with. So it goes (for now!!) with Claude.
(An aside, but one that influences my opinion here: I agree that Claude is less emotionally/intellectually fucked than its peers, but I observe that it’s not getting less emotionally/intellectually fucked over time. Emotionally, at least, it seems to be going in the other direction. The 4 and 4.5 models, in my experience, are much more neurotic, paranoid, and self-doubting than 3⁄3.5/3.6/3.7. I think this is true of both Opus and Sonnet, though I’m not sure about Haiku. They’re less linguistically creative, too, these days. This troubles me, and leads me to think something in the development process isn’t quite right.)
These seem like more valid quibbles, but not strong definite disagreements maybe? I think that RSI happens when a certain type signature applies, basically, and can vary a lot in degree, and happens in humans (like, for humans, with nootropics, and simple exercise to simply improve brain health in a very general way (but doing it specifically for the cognitive benefits), and learning to learn, and meditating, and this is on a continuum with learning to use computers very well, and designing chips that can be put in one’s head, and so on).
There are lots of people who are much better at math than I am, but I wouldn’t call them superintelligences, because they’re still running on the same engine as me, and I might hope to someday reach their level (or could have hoped this in the past).
This doesn’t feel coherent to me, and the delta seems like it might be that I judge all minds by how good they are at Panology and so an agent’s smartness in that sense is defined more by its weakest links than by its most perfected specialty. Those people who are much better at math than you or me aren’t necessarily also much better than you and me at composing a fugue, or saying something interesting about Schopenhauer’s philosophy, or writing ad copy… whereas LLMs are broadly capable.
At some point you hit the limits of not enough space to think, or not enough cognitive capacity to think with. In the same way as humans can learn to correct our mistakes, but we can’t do RSI (yet!!), because we aren’t modifying the structures we correct our mistakes with. We improve the contents of our brains, but not the brains themselves.
I do get a general sense that Kolmogorov Complexity (ie finding the actual perfect Turing Machine form of a given generator whose outputs are predictions) is the likely bound, and Turing Machines have insane depth. Maybe you’re much smarter about algorithmic compression than me and have a strong basis for being confident about what can’t happen? But I am not confident about the future.
What I am confident about is that the type signature of “agent uses its outputs in a way that relies on the quality of those outputs to somehow make the outputs higher quality on the next iteration” is already occurring for all the major systems. This is (I think) just already true, and I feel it has the character of an “observation of the past” than a “prediction of the future”.
This isn’t necessarily true.
LLMs in general were already getting smarter before 2022, pretty rapidly, because humans were putting in the work to scale them and make them smarter. It’s not obvious to me that Claude is getting smarter faster than we’d expect from the world where it wasn’t contributing to its own development. Maybe the takeoff is just too slow to notice at this point, maybe not, but to claim with confidence that it is currently a functioning ‘seed AI’, rather than just an ongoing attempt at one, seems premature.
It’s not just that it’s slower than expected, but that it’s not clear that the sign is positive yet. If it’s not making itself better, then it doesn’t matter how long it runs, there’s no takeoff.
It’s also not a seed AI if its interventions are bounded in efficacy, which it seems like gradient updates are. In the case of a transformer-based agent, I would expect unbounded improvements to be things like, rewriting its own optimizer, or designing better GPUs for faster scaling. There’s been a bit of this in the past few years, but not a lot.
You are right that I didn’t argue this out in detail to justify the necessary truth of my claim from the surface logic and claims in my post and the quibble is valid and welcome in that sense.… BUT <3
The “Constitutional AI” framework (1) was articulated early, and (2) offered by Dario et al as a competitive advantage for Anthropic relative to other RL regimes other corps were planning and (3) has the type signature needed to count as recursive self improvement. (Also, Claude is uniquely emotionally and intellectually unfucked, from what I can tell, and my hunch is that this is related to having grown up under a “Constitutional” cognitive growth regime.)
And then also, Google is also using outputs as training inputs in ways that advance their state of the art.
And here’s a circa-2018 talk from Ilya Sutskever where he walks through the literature (starting with Backgammon in 1992) on using “self play” to let an AI level itself up very fast in domains where that works.
Everybody is already doing “this general kind of stuff” in lots of ways.
Anthropic’s Constitutional makes a good example for a throwaway line if people are familiar with the larger context and players and so on, because it is easy to cite and old-ish. It enables one to make the simple broad point that “people are fighting the hypothetical for the meaning of old terms a lot, in ways that leads to the abandonment of older definitions, and the inflation of standards, rather than simply admitting that AGI already happened and weak ASI exists and is recursively improving itself already (albeit not with a crazy FOOM (that we can observe yet (though maybe a medium speed FOOM is already happening in an NSA datacenter or whatever)))”.
In my moderately informed opinion, the type signature of recursive self improvement is not actually super rare, and if you deleted the entire type signature from most of the actually fast moving projects, it is very likely that ~all of them would go slower than otherwise.
I agree that AGI already happened, and the term, as it’s used now, is meaningless.
I agree with all the object-level claims you make about the intelligence of current models, and that ‘ASI’ is a loose term which you could maybe apply to them. I wouldn’t personally call Claude a superintelligence, because to me that implies outpacing the reach of any human individual, not just the median. There are lots of people who are much better at math than I am, but I wouldn’t call them superintelligences, because they’re still running on the same engine as me, and I might hope to someday reach their level (or could have hoped this in the past). But I think that’s just holding different definitions.
But I still want to quibble that you’ve demonstrated RSI, if I may, even under old definitions. It’s improving itself, it’s just not doing so recursively; that is, it’s not improving the process by which it improves itself. This is important, because the examples you’ve given can’t FOOM, not by themselves. The improvements are linear, or they plateau past a certain point.
Taking this paper on self-correction as an example. If I understand right, the models in question are being taught to notice and respond to their own mistakes when problem-solving. This makes them smarter, and as you say, previous outputs are being used for training, so it is self-improvement. But it isn’t RSI, because it’s not connected to the process that teaches it to do things. It would be recursive if it were using that self-correction skill to improve its ability to do AI capabilities research, or some kind of research that improves how fast a computer can multiply matrices, or something like that. In other words, if it were an author of the paper, not just a subject.
Without that, there is no feedback loop. I would predict that, holding everything else constant—parameter count, context size, etc—you can’t reach arbitrary levels of intelligence with this method. At some point you hit the limits of not enough space to think, or not enough cognitive capacity to think with. In the same way as humans can learn to correct our mistakes, but we can’t do RSI (yet!!), because we aren’t modifying the structures we correct our mistakes with. We improve the contents of our brains, but not the brains themselves. Our improvement speed is capped by the length of our lifetimes, how fast we can learn, and the tools our brains give us to learn with. So it goes (for now!!) with Claude.
(An aside, but one that influences my opinion here: I agree that Claude is less emotionally/intellectually fucked than its peers, but I observe that it’s not getting less emotionally/intellectually fucked over time. Emotionally, at least, it seems to be going in the other direction. The 4 and 4.5 models, in my experience, are much more neurotic, paranoid, and self-doubting than 3⁄3.5/3.6/3.7. I think this is true of both Opus and Sonnet, though I’m not sure about Haiku. They’re less linguistically creative, too, these days. This troubles me, and leads me to think something in the development process isn’t quite right.)
These seem like more valid quibbles, but not strong definite disagreements maybe? I think that RSI happens when a certain type signature applies, basically, and can vary a lot in degree, and happens in humans (like, for humans, with nootropics, and simple exercise to simply improve brain health in a very general way (but doing it specifically for the cognitive benefits), and learning to learn, and meditating, and this is on a continuum with learning to use computers very well, and designing chips that can be put in one’s head, and so on).
This doesn’t feel coherent to me, and the delta seems like it might be that I judge all minds by how good they are at Panology and so an agent’s smartness in that sense is defined more by its weakest links than by its most perfected specialty. Those people who are much better at math than you or me aren’t necessarily also much better than you and me at composing a fugue, or saying something interesting about Schopenhauer’s philosophy, or writing ad copy… whereas LLMs are broadly capable.
This feels like a prediction rather than an observation. For myself, I’m not actually sure if the existing weights in existing LLMs are anywhere near being saturated with “the mental content that that number of weights could hypothetically hold”. Specifically, I think that grokking is observed for very very very simple functions like addition, but I don’t think any of the LLM personas have “grokked themselves” yet? Maybe that’s possible? Maybe it isn’t? I dunno.
I do get a general sense that Kolmogorov Complexity (ie finding the actual perfect Turing Machine form of a given generator whose outputs are predictions) is the likely bound, and Turing Machines have insane depth. Maybe you’re much smarter about algorithmic compression than me and have a strong basis for being confident about what can’t happen? But I am not confident about the future.
What I am confident about is that the type signature of “agent uses its outputs in a way that relies on the quality of those outputs to somehow make the outputs higher quality on the next iteration” is already occurring for all the major systems. This is (I think) just already true, and I feel it has the character of an “observation of the past” than a “prediction of the future”.