[Crosspost] AlphaTensor, Taste, and the Scalability of AI

Link post

My big takeaway from the AlphaTensor paper is that DeepMind have extraordinary taste: they’re able to identify problems that their approach to AI can tackle well, today; and they can figure out how to regiment those problems in such a way as make the AI tackle it:

Their approach is a variant of the deep reinforcement-learning-guided Monte Carlo tree search that they have applied so successfully to playing Chess and Go. What they have done, very effectively, is to design a game with the objective of finding the most efficient tensor multiplication algorithm for a matrix of some dimension.

On the presupposition that we don’t get AGI, much more AI research will look a lot like this. Find a good question, figure out how to regiment the question into the constraints imposed by the model’s architecture, and apply the model to answer the question.

But those skills don’t look like the sort of thing that benefit from scaling compute:

This ability – being able to reorganise a question in the form of a model-appropriate game – doesn’t look nearly as susceptible to Moore’s Law-style exponential speed-ups. Researchers’ insights and abilities – in other terms, researcher productivity – don’t scale exponentially. It takes time and energy and the serendipitousness of a well-functioning research lab to cultivate them. Scaling compute down to an effective cost of zero doesn’t help if we’re not using these models to attack the right issues in the right way.

So the bottleneck won’t be compute! The bottleneck will be the sort of excellent taste that DeepMind keep displaying.