Wait, people are doing this, instead of just turning words into numbers and having ‘models’ learn those? Anything GPT sized and getting results?
Not totally sure. There are advantages to character-level models, e.g. you can represent Twitter handles (which a word embedding based approach can have trouble). People have definitely trained character-level RNNs in the past. But I don’t know enough about NLP to say whether people have trained large models at the character level. (GPT uses byte pair encoding.)
Why this can’t be rescued with counterfactuals isn’t clear.
I suspect Alex would say that it isn’t clear how to define what a “counterfactual” is given the constraints he has (all you get is a physical closed system and a region of space within that system).
Not totally sure. There are advantages to character-level models, e.g. you can represent Twitter handles (which a word embedding based approach can have trouble). People have definitely trained character-level RNNs in the past. But I don’t know enough about NLP to say whether people have trained large models at the character level. (GPT uses byte pair encoding.)
I suspect Alex would say that it isn’t clear how to define what a “counterfactual” is given the constraints he has (all you get is a physical closed system and a region of space within that system).