Gerald Monroe comments on What a compute-centric framework says about AI takeoff speeds

Gerald Monroe 23 Jan 2023 23:35 UTC
1 point
0
Training the AGI may not be the expensive part. If we think current model architectures are flawed—they don’t use robust enough neural network architectures, they do not have the right topology needed to solve “AGI” grade cognitive tasks—then we need to search the spaces of:
1. Network subcomponents. Activation functions, larger blocks like transformers
2. Network architectures. Aka “n x m dimension of <architecture type X>, feeding into n x m dimension of <architecture type Y>”.
3. Cognitive architectures. Aka “system 1 output from a network of architecture type C feeds into a task meta controller that based on confidence either feeds the output to the robotics estimate module or...”. These are collections of modules, some of which will not even use neural networks, that form the cognitive architecture of the machine.
Technically an AGI is the combination of architectures that achieves “AGI level performance” (whatever heuristic we use) on a large and diverse benchmark of “AGI level tasks”. (tasks hard enough that 50% of humans will fail to pass them, or 99.9% depending on your AGI definition)
A superintelligence would be a machine that both passes a large AGI benchmark but scores better than all living humans (in the statistical sense, they are enough std devs away from the mean for humans that less than 1 in 8 billion humans are expected to be that good) on most tasks.
So if you think about it, searching this space—by making many failed AGI candidates to gain information about the possibility space—could eat up many OOMs more compute. If we have to make 1e6 full AGI models for example, each one failing the benchmark but some doing better than others.
It may not be this difficult though, and it’s possible that ANY architecture that has something for the minimum essential parts and sufficient compute to train it will pass the bench.