Rohin Shah comments on [AN #156]: The scaling hypothesis: a plan for building AGI

Rohin Shah 16 Jul 2021 19:55 UTC
4 points
Wait, people are doing this, instead of just turning words into numbers and having ‘models’ learn those? Anything GPT sized and getting results?
Not totally sure. There are advantages to character-level models, e.g. you can represent Twitter handles (which a word embedding based approach can have trouble). People have definitely trained character-level RNNs in the past. But I don’t know enough about NLP to say whether people have trained large models at the character level. (GPT uses byte pair encoding.)
Why this can’t be rescued with counterfactuals isn’t clear.
I suspect Alex would say that it isn’t clear how to define what a “counterfactual” is given the constraints he has (all you get is a physical closed system and a region of space within that system).