DL towards the unaligned Recursive Self-Optimization attractor

Consider this abridged history of recent ML progress:

A decade or two ago, computer vision was a field that employed dedicated researchers who designed specific increasingly complex feature recognizers (SIFT, SURF, HoG, etc.) These were usurped by deep CNNs with fully learned features in the 2010′s[1], which subsequently saw success in speech recognition, various NLP tasks, and much of AI, competing with other general ANN models, namely various RNNs and LSTMs. Then SOTA in CNNs and NLP evolved separately towards increasingly complex architectures until the simpler/​general transformers took over NLP and quickly spread to other domains (even RL), there also often competing with newer simpler/​general architectures arising within those domains, such as MLP-mixers in vision. Waves of colonization in design-space.

So the pattern is: increasing human optimization power steadily pushing up architecture complexity is occasionally upset/​reset by a new simpler more general model, where the new simple/​general model substitutes human optimization power for automated machine optimization power[2], enabled by improved compute scaling, ala the bitter lesson. DL isn’t just a new AI/​ML technique, it’s a paradigm shift.

Ok, fine, then what’s next?

All of these models, from the earliest deep CNNs on GPUs up to GPT-3 and EfficientZero, generally have a few major design components that haven’t much changed:

  1. Human designed architecture, rather than learned or SGD-learnable-at-all

  2. Human designed backprop SGD variant (with only a bit of evolution from vanilla SGD to Adam & friends)

Obviously there are research tracks in DL such as AutoML/​Arch-search and Meta-learning aiming to automate the optimization of architecture and learning algorithms. They just haven’t dominated yet.

So here is my hopefully-now-obvious prediction: in this new decade internal meta-optimization will take over, eventually leading to strongly recursively self optimizing learning machines: models that have broad general flexibility to adaptively reconfigure their internal architecture and learning algorithms dynamically based on the changing data environment/​distribution and available compute resources[3].

If we just assume for a moment that the strong version of this hypothesis is correct, it suggests some pessimistic predictions for AI safety research:

  1. Interpretability will fail—future DL descendant is more of a black box, not less

  2. Human designed architectural constraint fails, as human designed architecture fails

  3. IRL/​Value Learning is far more difficult than first appearances suggest, see #2

  4. Progress is hyper-exponential, not exponential. Thus trying to trend-predict DL superintelligence from transformer scaling is more difficult than trying to predict transformer scaling from pre 2000-ish ANN tech, long before rectifiers and deep layer training tricks.

  5. Global political coordination on constraints will likely fail, due to #4 and innate difficulty.

There is an analogy here to the history-revision attack against Bitcoin. Bitcoin’s security derives from the computational sacrifice invested into the longest chain. But Moore’s Law leads to an exponential decrease in the total cost of that sacrifice over time, which when combined with an exponential increase in total market cap, can lead to the surprising situation where recomputing the entire PoW history is not only plausible but profitable.[4]

In 2010 few predicted that computer Go would beat a human champion just 5 years hence[5], and far fewer (or none) predicted that a future successor of that system would do much better by relearning the entire history of Go strategy from scratch, essentially throwing out the entire human tech tree [6].

So it’s quite possible that future meta-optimization throws out the entire human architecture/​algorithm tech tree for something else substantially more effective[7]. The circuit algorithmic landscape lacks most all the complexity of the real world, and in that sense is arguably much more similar to Go or chess. Humans are general enough learning machines to do reasonably well at anything, but we can only apply a fraction of our brain capacity to such an evolutionary novel task, and tend to lose out to more specialized scaled up DL algorithms long before said algorithms outcompete humans at all tasks, or even everday tasks.

Yudkowsky anticipated recursive self-improvement would be the core thing that enables AGI/​superintelligence. Reading over that 2008 essay now in 2021, I think he mostly got the gist of it right, even if he didn’t foresee/​bet that connectivism would be the winning paradigm. EY2008 seems to envision RSI as an explicit cognitive process where the AI reads research papers, discusses ideas with human researchers, and rewrites its own source code.

Instead in the recursive self-optimization through DL future we seem to be careening towards, the ‘source code’ is the ANN circuit architecture (as or more powerful than code), and reading human papers, discussing research: all that is unnecessary baggage, as unnecessary as it was for AlphaGo Zero to discuss chess with human chess experts over tea or study their games over lunch. History-revision attack, incoming.

So what can we do? In the worst case we have near-zero control over AGI architecture or learning algorithms. So that only leaves initial objective/​utility functions, compute and training environment/​data. Compute restriction is obvious and has an equally obvious direct tradeoff with capability—not much edge there.

Even a super powerful recursive self-optimizing machine initially starts with some seed utility/​objective function at the very core. Unfortunately it increasingly looks like efficiency strongly demands some form of inherently unsafe self-motivation utility function, such as empowerment or creativity, and self-motivated agentic utility functions are the natural strong attractor[8].

Control over training environment/​data is a major remaining lever that doesn’t seem to be explored much, and probably has better capability/​safety tradeoffs than compute. What you get out of the recursive self optimization or universal learning machinery is always a product of the data you put in, the embedded environment; that is ultimately what separates Go bots, image detectors, story writing AI, feral children, and unaligned superintelligences.

And then finally we can try to exert control on the base optimizer, which in this case is the whole technological research industrial economy. Starting fresh with a de novo system may be easier than orchestrating a coordination miracle from the current Powers.


  1. ↩︎

    Alexnet is typically considered the turning point, but the transition started earlier; sparse coding and RBMs are two examples of successful feature learning techniques pre-DL.

  2. ↩︎

    If you go back far enough, the word ‘computer’ itself originally denoted a human occupation! This trend is at least a century old.

  3. ↩︎

    DL ANNs do a form of approximate bayesian updating over the implied circuit architecture space with every backprop update, which already is a limited form of self-optimization.

  4. ↩︎

    Blockchain systems have a simple defense against history-revision attack: checkpointing, but unfortunately that doesn’t have a realistic equivalent in our case—we don’t control the timestream.

  5. ↩︎

    My system-1 somehow did in this 2010 LW comment.

  6. ↩︎

    I would have bet against this; AlphaGo Zero surprised me far more than AlphaGo.

  7. ↩︎

    Quite possible != inevitable. There is still a learning efficiency gap vs the brain, and I have uncertainty over how quickly we will progress past that gap, and what happens after.

  8. ↩︎

    Tool-AI, like GPT-3, is a form of capability constraint, in that economic competition is always pressuring tool-AIs to become agent-AIs.