Matthew Barnett comments on [missing post]

Matthew Barnett 18 Feb 2020 7:04 UTC
LW: 2 AF: 1
0
AF
My reaction would be “sure, that sounds like exactly the sort of thing that happens from time to time”.
Insights trickle in slowly. Over the long-run, you can see vast efficiency improvements. But this seems unrealistically fast. You would really believe that a single person or team did something like that, which if true would completely and radically reshape the field of computer vision, because “it happens from time to time”?
In fact, if you replace the word “memory” with either “data” or “compute”, then this has already happened with the advent of transformer architectures just within the past few years, on the training side of things.
Transformers are impressive, but how much of their usefulness is due to efficiency by having good representations of the data? I argue, not by orders of magnitude. OpenAI recently did this comparison to LSTMs, and this was their result.