I find the social implications implausible. Even if the technical ability is there, and “the tools teach themselves,” social inertia is very high. In my model, it takes years from when GPT whispering is actually cheap, available, and useful to when it becomes so normal that bashing it will be cool ala the no-GPT dates.
Building good services on top of the API takes time, too.
I find the whole context window gets solved prediction unrealistic. There are alternatives proposed but they obviously weren’t enough for GPT4. We don’t even know whether GPT4-32k works well with a large context window or not, regardless of its costs. It’s not obvious that the attention mechanism can just work with any long-range dependency. (Perhaps an O(n^3) algorithm is needed for dependencies so far away.) The teacher-forcing autoregressive training might limit high-quality generation length, too.
I find the social implications implausible. Even if the technical ability is there, and “the tools teach themselves,” social inertia is very high. In my model, it takes years from when GPT whispering is actually cheap, available, and useful to when it becomes so normal that bashing it will be cool ala the no-GPT dates.
Building good services on top of the API takes time, too.
I find the whole context window gets solved prediction unrealistic. There are alternatives proposed but they obviously weren’t enough for GPT4. We don’t even know whether GPT4-32k works well with a large context window or not, regardless of its costs. It’s not obvious that the attention mechanism can just work with any long-range dependency. (Perhaps an O(n^3) algorithm is needed for dependencies so far away.) The teacher-forcing autoregressive training might limit high-quality generation length, too.