In his TED talk, Eliezer guesses superintelligence will arrive after “zero to two more breakthroughs the size of transformers.” I’ve heard others voice similar takes. But I haven’t heard much discussion of which number it is.
As an aside, I think that the amount of algorithm efficiency improvement since transformers has arguably been much more than 2x the innovation that transformers were. E.g. Epoch estimates here that transformers were 23% of the algorithmic improvement that’s happened over the time period starting with their publication.
Also, note that “Attention is all you need” didn’t invent self-attention, but just demonstrated that you can make a language model with just self-attention (and MLPs) and no recurrence. And several papers had introduced self-attention (I think the previous year).
As an aside, I think that the amount of algorithm efficiency improvement since transformers has arguably been much more than 2x the innovation that transformers were. E.g. Epoch estimates here that transformers were 23% of the algorithmic improvement that’s happened over the time period starting with their publication.
Also, note that “Attention is all you need” didn’t invent self-attention, but just demonstrated that you can make a language model with just self-attention (and MLPs) and no recurrence. And several papers had introduced self-attention (I think the previous year).