Examining intuitions around discontinuity driven by recursive self improvement (RSI)
I had a couple un-examined intuitions that made the case for abrupt takeoff triggered by self-aware RSI appear plausible in my mind. I’ll lay out a couple lines of intuitions regarding why RSI might lead to discontinuity in capabilities and then debunk them. On reflection I believe that rich forms of self-awareness in RSI as entirely compatible with gradual takeoff. There are other, possibly better, intuitions for RSI takeoff though; for instance my below points do not address super-exponential progress from automated researchers!
My old intuitions:
Once an AI can engage in targeted self-modification this capacity will unlock some off trend acceleration to capabilities improvement
Currently AIs are targeted at arbitrary cognitive tasks, but they can eventually be targeted more precisely at improving on questions that lead to higher payoff in terms of agency/intelligence/[other general capabilities]
Both of these are variants of the idea “Fine-grained self-awareness in a learner can unlock far more efficient learning”.
Now let’s examine them.
Once an AI can engage in targeted self-modification this capacity will unlock some off trend acceleration to capabilities improvement
Assume an AI has access to some rich interface for self-modification. Previously learning was mostly SGD or similar, but the problem faced by any learning rule remains! How do you search the parameter space, and how do you attribute credit, etc.? Why should introspective access provide more than an incremental improvement to the scaling law’s coefficient? For humans for instance, our level of introspective access is just far too weak to be able to tell us anything about neurological edits even if we had the tools to do these edits cleanly!
Currently AIs are targeted at arbitrary cognitive tasks, but they can eventually be targeted more precisely at improving on questions that lead to higher payoff in terms of agency/intelligence/[other general capabilities]
I see two sub-problems here.
(2a) Problem selection and creation: Of course, some weak version of active learning is possible! You can get calibration of an amortised model to predict which questions it ‘already knows’, and which are challenging. But what does that buy us? Again a minor speed up. To do better we need to be deeply strategic about problem selection and creation. This again sounds like an intrinsically hard problem you have to search the combinatorially large space of problems to find one that you must then recognise could develop some capacity of interest.
(2b) Are there problems which ‘directly’ target core capability latents? What would it mean for a problem to provide radically better learning signal on long-horizon agency, or IQ than another problem? Seems unlikely that there are problems which across a reasonable distribution of learners are far better than existing human curricula and questions at improving these competencies. If we want problems that are particularly valuable to an individual learner (AI), such problems exist but again as in (2a) they are intrinsically hard to find.
As an example of these phenomena, consider obstacles to improving on long horizon decision making: Situations where very long horizons matter are sparse. Opportunities to train that capability (i.e., get dense feedback on genuinely long-run plans) are also sparse. What’s more, the capacity to acquire increasingly long-horizon thinking may be generic, but particular long-horizon plans remain domain-specific.
Examining intuitions around discontinuity driven by recursive self improvement (RSI)
I had a couple un-examined intuitions that made the case for abrupt takeoff triggered by self-aware RSI appear plausible in my mind. I’ll lay out a couple lines of intuitions regarding why RSI might lead to discontinuity in capabilities and then debunk them. On reflection I believe that rich forms of self-awareness in RSI as entirely compatible with gradual takeoff. There are other, possibly better, intuitions for RSI takeoff though; for instance my below points do not address super-exponential progress from automated researchers!
My old intuitions:
Once an AI can engage in targeted self-modification this capacity will unlock some off trend acceleration to capabilities improvement
Currently AIs are targeted at arbitrary cognitive tasks, but they can eventually be targeted more precisely at improving on questions that lead to higher payoff in terms of agency/intelligence/[other general capabilities]
Both of these are variants of the idea “Fine-grained self-awareness in a learner can unlock far more efficient learning”.
Now let’s examine them.
Once an AI can engage in targeted self-modification this capacity will unlock some off trend acceleration to capabilities improvement
Assume an AI has access to some rich interface for self-modification. Previously learning was mostly SGD or similar, but the problem faced by any learning rule remains! How do you search the parameter space, and how do you attribute credit, etc.? Why should introspective access provide more than an incremental improvement to the scaling law’s coefficient? For humans for instance, our level of introspective access is just far too weak to be able to tell us anything about neurological edits even if we had the tools to do these edits cleanly!
Currently AIs are targeted at arbitrary cognitive tasks, but they can eventually be targeted more precisely at improving on questions that lead to higher payoff in terms of agency/intelligence/[other general capabilities]
I see two sub-problems here.
(2a) Problem selection and creation: Of course, some weak version of active learning is possible! You can get calibration of an amortised model to predict which questions it ‘already knows’, and which are challenging. But what does that buy us? Again a minor speed up. To do better we need to be deeply strategic about problem selection and creation. This again sounds like an intrinsically hard problem you have to search the combinatorially large space of problems to find one that you must then recognise could develop some capacity of interest.
(2b) Are there problems which ‘directly’ target core capability latents? What would it mean for a problem to provide radically better learning signal on long-horizon agency, or IQ than another problem? Seems unlikely that there are problems which across a reasonable distribution of learners are far better than existing human curricula and questions at improving these competencies. If we want problems that are particularly valuable to an individual learner (AI), such problems exist but again as in (2a) they are intrinsically hard to find.
As an example of these phenomena, consider obstacles to improving on long horizon decision making: Situations where very long horizons matter are sparse. Opportunities to train that capability (i.e., get dense feedback on genuinely long-run plans) are also sparse. What’s more, the capacity to acquire increasingly long-horizon thinking may be generic, but particular long-horizon plans remain domain-specific.