For the Record: DL ∩ ASI = ∅

Disclaimer: This post isn’t intended to convince or bring anyone closer to a certain position on AI. It is merely—as the title suggests—for the record.

I would like to publicly record my prediction about the prospects of artificial superintelligence, consisting of a weak and a strong thesis:

Weak thesis: Current deep learning paradigms will not be sufficient to create an artificial superintelligence.

You could call this the Anti-”scaling-maximalist”-thesis, except that it goes quite a bit further by including possible future deep leaning architectures. Of course, “deep learning” is doing a lot of work here, but as a rule of thumb, I would consider a new architecture fits within the DL paradigm if it involves a very large number (at least in the millions) of randomly initialized parameters that are organized into largely uniform layers and are updated via a simple algorithm like backpropagation.

I intentionally use the word “superintelligence” here because “AGI” or “human-level intelligence” have become rather loaded terms, the definitions of which are frequently a point of contention. I take superintelligence to mean an entity so obviously powerful and impressive in its accomplishments that all debates around its superhuman nature should be settled instantly. Feats like building a Dyson Sphere, launching intergalactic colony ships at 0.9c, designing and deploying nanobots that eat the whole biosphere etc. Stuff like passing the Turing Test or Level 5 self-driving decidedly do not qualify (not that I think these challenges will be easy; but I want to make the demarcation blatantly clear).

(the downside is that we may never live to witness such an ASI in case it turns out to be UnFriendly, but by then earning some reputation points on the internet will be the least of my concerns)

Strong thesis: If and when the first ASI is built, it will not use deep learning as one of its components.

For instance, if the ASI uses a few CNN-layers to pre-process visual inputs, or some autoencoder system to distill data into latent variables, that’s already enough to refute the strong thesis. On the other hand, merely running on hardware that was design-assisted by DL-systems does not disqualify the ASI from being DL-free.

For future reference, here is some context this prediction was made in:

It is the beginning of 2023, 1.5 months after the release of chatGPT, 5 months after the release of Stable Diffusion, both of which have renewed hype around deep learning and Transformer-based models in particular. Rumors around yet-to-be-released GPT-4 presenting a big leap in capabilities are floating about. The scaling maximalist position has gained a lot of traction both inside and outside the community and may even represent the mainstream opinion on LW. Timeline predictions are short (AGI by 2030-35 seems to be the consensus), broadly speaking, and even extremely short timelines (~2025) are not unheard of.