Initial attempts from API-users putting LLMs into agentic wrappers (e.g. AutoGPT, BabyAGI) don’t seem to have made any progress.
I would not expect those attempts to work, and their failures don’t update me at all against the possibility of RSI.
If the failures of those things to work don’t update you against RSI, then if they succeed that can’t update you towards the possibility of RSI.
I personally would not be that surprised, even taking into account the failures of the first month or two, if someone manages to throw together something vaguely semi-functional in that direction, and if the vaguely semi-functional version can suggest improvements to itself that sometimes help. Does your model of the world exclude that possibility?
A glider flies, but without self-propulsion it doesn’t go very far. Would seeing a glider land before traveling long distance update you against the possibility of fixed-wing flight working? It might, but it needn’t. Someone comes along and adds an engine and propeller and all of a sudden the thing can really fly. With the addition of one extra component you update all the way to fixed-wing flight works.
It’s the same thing here. Maybe these current systems are relatively good analogues of what will later be RSI-ing AGI and all they’re missing right now is an engine and propeller. If someone comes along and adds a propeller and engine and gets them really flying in some basic way, then it’s perfectly reasonable to update toward that possibility.
(Someone please correct me if my logic is wrong here.)
If I had never seen a glider before, I would think there was a nonzero chance that it could travel a long distance without self-propulsion. So if someone runs the experiment of “see if you can travel a long distance with a fixed wing glider and no other innovations”, I could either observe that it works, or observe that it doesn’t.
If you can travel a long distance without propulsion, that obviously updates me very far in the direction of “fixed-wing flight works”.
So by conservation of expected evidence, observing that a glider with no propulsion doesn’t make it very far has to update me at least slightly in the direction of “fixed-wing flight does not work”. Because otherwise I would expect to update in the direction of “fixed-wing flight works” no matter what observation I made.
Note that OP said “does not update me at all” not “does not update me very much”—and the use of the language “update me” implies the strong “in a bayesian evidence sense” meaning of the words—this is not a nit I would have picked if OP had said “I don’t find the failures of autogpt and friends to self-improve to be at all convincing that RSI is impossible”.
I agree with @awg . I think that a clumsy incomplete attempt at RSI failing which I expect to fail based on my model of the situation is very little evidence that a strong attempt would fail. I think that seeing the clumsy incomplete attempt succeed is strong evidence that the problem is easier than I thought it was, and that RSI is definitely possible. It is me realizing I made a mistake, but the mistake is
“oh, my complicated ideas weren’t even needed, RSI was even easier than I thought. Now I can be totally confident that RSI is near-term possible instead of just pretty sure.”
not
“Huh, didn’t work exactly in the way I predicted, guess I know nothing at all about the world now, thus I can’t say if RSI is possible.”
Also, the statement that I’m making is that the current state of attempts of RSI via AutoGPT/BabyAGI are weak and incomplete. Obviously a bunch of people are putting in work to improve them. I don’t know what those attempts will look like a year from now. I slightly suspect that there is secret RSI work going on in the major labs, and those highly-competent well-resourced well-coordinated teams will beat the enthusiast amateurs to the punch. I’m not highly confident in that prediction though.
If you had said “very little evidence” I would not have objected. But if there are several possible observations which update you towards RSI being plausible, and no observations that update you against RSI being plausible, something has gone wrong.
Oh, there are lots of observations which update me against RSI being plausible. I have a list in fact, of specific experiments I would like to see done which would convince me that RSI is much harder than I expect and not a near-term worry. I’m not going to discuss that list however, because I don’t have a safe way to do so. So there absolutely are pieces of evidence which would sway me, they just aren’t ‘evidence that RSI is easier than I expect’. Such evidence would convince me that RSI is easy, not that it is impossible.
Hm, I think I’m still failing to communicate this clearly.
RSI might be practical, or it might not be practical. If it is practical, it might be trivial, or it might be non-trivial.
If, prior to AutoGPT and friends, you had assigned 10% to “RSI is trivial”, and you make an observation of whether RSI is trivial, you should expect that
10% of the time, you observe that RSI is trivial. You update to 100% to “RSI is trivial”, 0% “RSI is practical but not trivial”, 0% “RSI is impractical”.
90% of the time, you observe that RSI is not trivial. You update to 0% “RSI is trivial”, 67% “RSI is practical but not trivial”, 33% “RSI is impractical”.
By “does your model exclude the possibility of RSI-through-hacking-an-agent-together-out-of-LLMs”, I mean the following: prior to someone first hacking together AutoGPT, you thought that there was less than a 10% chance that something like that would work to do the task of “make and test changes to its own architecture, and keep the ones that worked” well enough to be able to do that task better.
Assigning 10% seems like a lot in the context of this question, even for purposes of an example.
What if you had assigned less than 0.01% to “RSI is so trivial that the first kludged loop to GPT-4 by an external user without access to the code or weights would successfully self-improve”? It would have been at least that surprising to me if it had worked.
Failure to achieve it was not surprising at all, in the sense that any update I made from this would be completely swamped by the noise in such an estimate, and definitely not worth the cognitive effort to consciously carry it through to any future estimates of RSI plausibility in general.
What if you had assigned less than 0.01% to “RSI is so trivial that the first kludged loop to GPT-4 by an external user without access to the code or weights would successfully self-improve”?
I would think you were massively overconfident in that. I don’t think you could make 10,000 predictions like that and only be wrong once (for a sense of intuition, that’s like making one prediction per hour, 8 hours per day, 5 days a week for 5 years, and being wrong once).
Unless you mean “recursively self-improve all the way to godhood” instead of “recursively self-improve to the point where it would discover things as hard as the first improvement it found in like 10% as much time as it took originally”.
For reference for why I did give at least 10% to “the dumbest possible approach will work to get meaningful improvement”—humans spent many thousands of years not developing much technology at all, and then, a few thousand years ago, suddenly started doing agriculture and building cities and inventing tools. The difference between “humans do agriculture” and “humans who don’t” isn’t pure genetics—humans came to the Americas over 20,000 years ago, agriculture has only been around for about 10,000 of those 20,000 years, and yet there were fairly advanced agricultural civilizations in the Americas thousands of years ago. Which says to me that, for humans at least, most of our ability to do impressive things comes from our ability to accumulate a bunch of tricks that work over time, and communicate those tricks to others.
So if it turned out that “the core of effectiveness for a language model is to make a dumb wrapper script and the ability to invoke copies of itself with a different wrapper script, that’s enough for it to close the gap between the capabilities of the base language model and the capabilities of something as smart as the base language model but as coherent as a human”, I would have been slightly surprised, but not surprised enough that I could have made 10 predictions like that and only been wrong about one of them. Certainly not 100 or 10,000 predictions like that.
Edit: Keep in mind that the dumbest possible approach of “define a JSON file that describes the tool and ensure that that JSON file has a link to detailed API docsdoes work for teaching GPT-4 how to use tools.
My estimate is based on the structure of the problem and the entity trying to solve it. I’m not treating it as some black-box instance of “the dumbest thing can work”. I agree that the latter types of problem should be assigned more than 0.01%.
I already knew quite a lot about GPT-4′s strengths and weaknesses, and about the problem domain it needs to operate in for self-improvement to take place. If I were a completely uneducated layman from 1900 (or even from 2000, probably) then a probability of 10% or more might be reasonable.
Ah yes, I see what you mean. This seems like trivial semantic nitpicking to me, but I will go ahead and update the wording of the sentence to allow for the fact that I had some tiny amount of belief that a very crude AutoGPT approach would work and thus seeing it not immediately work means that my overall beliefs were infinitesimally altered by this.
Yeah. I had thought that you used the wording “don’t update me at all” instead of “aren’t at all convincing to me” because you meant something precise that was not captured by the fuzzier language. But on reflection it’s probably just that language like “updating” is part of the vernacular here now.
Sorry, I had meant that to be a one-off side note, not a whole thing.
The bit I actually was surprised by was that you seem to think there was very little chance that the crude approach could have worked. In my model of the world, “the simplest thing that could possibly work” ends up working a substantial amount of the time. If your model of the world says the approach of “just piling more hacks and heuristics on top of AutoGPT-on-top-of-GPT4 will get it to the point where it can come up with additional helpful hacks and heuristics that further improve its capabilities” almost certainly won’t work that’s a bold and interesting advance prediction in my book.
MY guess at whether GPT-4 can self-improve at all with a lot of carefully engineered external systems and access to its own source code and weights is a great deal higher than that AutoGPT would self-improve. The failure of AutoGPT says nothing[1] to me about that.
If the failures of those things to work don’t update you against RSI, then if they succeed that can’t update you towards the possibility of RSI.
I personally would not be that surprised, even taking into account the failures of the first month or two, if someone manages to throw together something vaguely semi-functional in that direction, and if the vaguely semi-functional version can suggest improvements to itself that sometimes help. Does your model of the world exclude that possibility?
A glider flies, but without self-propulsion it doesn’t go very far. Would seeing a glider land before traveling long distance update you against the possibility of fixed-wing flight working? It might, but it needn’t. Someone comes along and adds an engine and propeller and all of a sudden the thing can really fly. With the addition of one extra component you update all the way to fixed-wing flight works.
It’s the same thing here. Maybe these current systems are relatively good analogues of what will later be RSI-ing AGI and all they’re missing right now is an engine and propeller. If someone comes along and adds a propeller and engine and gets them really flying in some basic way, then it’s perfectly reasonable to update toward that possibility.
(Someone please correct me if my logic is wrong here.)
If I had never seen a glider before, I would think there was a nonzero chance that it could travel a long distance without self-propulsion. So if someone runs the experiment of “see if you can travel a long distance with a fixed wing glider and no other innovations”, I could either observe that it works, or observe that it doesn’t.
If you can travel a long distance without propulsion, that obviously updates me very far in the direction of “fixed-wing flight works”.
So by conservation of expected evidence, observing that a glider with no propulsion doesn’t make it very far has to update me at least slightly in the direction of “fixed-wing flight does not work”. Because otherwise I would expect to update in the direction of “fixed-wing flight works” no matter what observation I made.
Note that OP said “does not update me at all” not “does not update me very much”—and the use of the language “update me” implies the strong “in a bayesian evidence sense” meaning of the words—this is not a nit I would have picked if OP had said “I don’t find the failures of autogpt and friends to self-improve to be at all convincing that RSI is impossible”.
@faul_sname
I agree with @awg . I think that a clumsy incomplete attempt at RSI failing which I expect to fail based on my model of the situation is very little evidence that a strong attempt would fail. I think that seeing the clumsy incomplete attempt succeed is strong evidence that the problem is easier than I thought it was, and that RSI is definitely possible. It is me realizing I made a mistake, but the mistake is
“oh, my complicated ideas weren’t even needed, RSI was even easier than I thought. Now I can be totally confident that RSI is near-term possible instead of just pretty sure.”
not
“Huh, didn’t work exactly in the way I predicted, guess I know nothing at all about the world now, thus I can’t say if RSI is possible.”
Also, the statement that I’m making is that the current state of attempts of RSI via AutoGPT/BabyAGI are weak and incomplete. Obviously a bunch of people are putting in work to improve them. I don’t know what those attempts will look like a year from now. I slightly suspect that there is secret RSI work going on in the major labs, and those highly-competent well-resourced well-coordinated teams will beat the enthusiast amateurs to the punch. I’m not highly confident in that prediction though.
If you had said “very little evidence” I would not have objected. But if there are several possible observations which update you towards RSI being plausible, and no observations that update you against RSI being plausible, something has gone wrong.
Oh, there are lots of observations which update me against RSI being plausible. I have a list in fact, of specific experiments I would like to see done which would convince me that RSI is much harder than I expect and not a near-term worry. I’m not going to discuss that list however, because I don’t have a safe way to do so. So there absolutely are pieces of evidence which would sway me, they just aren’t ‘evidence that RSI is easier than I expect’. Such evidence would convince me that RSI is easy, not that it is impossible.
Hm, I think I’m still failing to communicate this clearly.
RSI might be practical, or it might not be practical. If it is practical, it might be trivial, or it might be non-trivial.
If, prior to AutoGPT and friends, you had assigned 10% to “RSI is trivial”, and you make an observation of whether RSI is trivial, you should expect that
10% of the time, you observe that RSI is trivial. You update to 100% to “RSI is trivial”, 0% “RSI is practical but not trivial”, 0% “RSI is impractical”.
90% of the time, you observe that RSI is not trivial. You update to 0% “RSI is trivial”, 67% “RSI is practical but not trivial”, 33% “RSI is impractical”.
By “does your model exclude the possibility of RSI-through-hacking-an-agent-together-out-of-LLMs”, I mean the following: prior to someone first hacking together AutoGPT, you thought that there was less than a 10% chance that something like that would work to do the task of “make and test changes to its own architecture, and keep the ones that worked” well enough to be able to do that task better.
Assigning 10% seems like a lot in the context of this question, even for purposes of an example.
What if you had assigned less than 0.01% to “RSI is so trivial that the first kludged loop to GPT-4 by an external user without access to the code or weights would successfully self-improve”? It would have been at least that surprising to me if it had worked.
Failure to achieve it was not surprising at all, in the sense that any update I made from this would be completely swamped by the noise in such an estimate, and definitely not worth the cognitive effort to consciously carry it through to any future estimates of RSI plausibility in general.
I would think you were massively overconfident in that. I don’t think you could make 10,000 predictions like that and only be wrong once (for a sense of intuition, that’s like making one prediction per hour, 8 hours per day, 5 days a week for 5 years, and being wrong once).
Unless you mean “recursively self-improve all the way to godhood” instead of “recursively self-improve to the point where it would discover things as hard as the first improvement it found in like 10% as much time as it took originally”.
For reference for why I did give at least 10% to “the dumbest possible approach will work to get meaningful improvement”—humans spent many thousands of years not developing much technology at all, and then, a few thousand years ago, suddenly started doing agriculture and building cities and inventing tools. The difference between “humans do agriculture” and “humans who don’t” isn’t pure genetics—humans came to the Americas over 20,000 years ago, agriculture has only been around for about 10,000 of those 20,000 years, and yet there were fairly advanced agricultural civilizations in the Americas thousands of years ago. Which says to me that, for humans at least, most of our ability to do impressive things comes from our ability to accumulate a bunch of tricks that work over time, and communicate those tricks to others.
So if it turned out that “the core of effectiveness for a language model is to make a dumb wrapper script and the ability to invoke copies of itself with a different wrapper script, that’s enough for it to close the gap between the capabilities of the base language model and the capabilities of something as smart as the base language model but as coherent as a human”, I would have been slightly surprised, but not surprised enough that I could have made 10 predictions like that and only been wrong about one of them. Certainly not 100 or 10,000 predictions like that.
Edit: Keep in mind that the dumbest possible approach of “define a JSON file that describes the tool and ensure that that JSON file has a link to detailed API docs does work for teaching GPT-4 how to use tools.
My estimate is based on the structure of the problem and the entity trying to solve it. I’m not treating it as some black-box instance of “the dumbest thing can work”. I agree that the latter types of problem should be assigned more than 0.01%.
I already knew quite a lot about GPT-4′s strengths and weaknesses, and about the problem domain it needs to operate in for self-improvement to take place. If I were a completely uneducated layman from 1900 (or even from 2000, probably) then a probability of 10% or more might be reasonable.
Thanks @JBlack, your comment describes my point of view as well.
Ah yes, I see what you mean. This seems like trivial semantic nitpicking to me, but I will go ahead and update the wording of the sentence to allow for the fact that I had some tiny amount of belief that a very crude AutoGPT approach would work and thus seeing it not immediately work means that my overall beliefs were infinitesimally altered by this.
Yeah. I had thought that you used the wording “don’t update me at all” instead of “aren’t at all convincing to me” because you meant something precise that was not captured by the fuzzier language. But on reflection it’s probably just that language like “updating” is part of the vernacular here now.
Sorry, I had meant that to be a one-off side note, not a whole thing.
The bit I actually was surprised by was that you seem to think there was very little chance that the crude approach could have worked. In my model of the world, “the simplest thing that could possibly work” ends up working a substantial amount of the time. If your model of the world says the approach of “just piling more hacks and heuristics on top of AutoGPT-on-top-of-GPT4 will get it to the point where it can come up with additional helpful hacks and heuristics that further improve its capabilities” almost certainly won’t work that’s a bold and interesting advance prediction in my book.
MY guess at whether GPT-4 can self-improve at all with a lot of carefully engineered external systems and access to its own source code and weights is a great deal higher than that AutoGPT would self-improve. The failure of AutoGPT says nothing[1] to me about that.
In the usual sense of not being anywhere near worth the effort to include it in any future credences.