I don’t buy that argument at all. “text space” seems to have been adequate to get to GPT3 which is incredibly impressive and useful in a variety of ways. Furthermore, what proof do you have that resulting insights wouldn’t transfer to multi-modal systems like GPT4 (which can see) or Palm-E which is embodied and can see and operate in “text space”. Moreover, I’m not the first to point out that text space seems to incentivize models develop highly sophisticated thinking abilities which seem like the more important thing to focus on.
You seem to be making a very general cloud of claims about the impressiveness of transformers. I was making a very specific claim about the system described in the post, and in what sense it’s not myopic.