Indeed but to slightly counter balance this, at the same time, it looks like it was trained on ~500B tokens (while ~300B were used for GPT-3 and something like ~50B for GPT-2).
I wonder if there is any major plan to greatly expand the context window? Or perhaps add a sort “inner voice”/chain of thought for the model to write down its intermediate computational steps to refer to in the future? I’m aware the context window increases with parameter count.
Correct me if I’m wrong, but a context window of even 20,000 memory tokens could be enough for it to reliably imitate a human’s short-term memory to consistently pass a limited Turing Test (e.g. the same Eugene Goostman barely scraped by ~2014), as opposed to the constant forgetting of LSTMs and Markov Chains. Sure, the Turing Test isn’t particularly useful for AI benchmarks, but the market for advanced conversational agents could be a trillion-dollar business, and the average Joe is far more susceptible to the ELIZA Effect than we commonly idealize.
Indeed but to slightly counter balance this, at the same time, it looks like it was trained on ~500B tokens (while ~300B were used for GPT-3 and something like ~50B for GPT-2).
Good point. Still though, there is room for a few more orders of magnitude of data increase. And parameter increase.
I wonder if there is any major plan to greatly expand the context window? Or perhaps add a sort “inner voice”/chain of thought for the model to write down its intermediate computational steps to refer to in the future? I’m aware the context window increases with parameter count.
Correct me if I’m wrong, but a context window of even 20,000 memory tokens could be enough for it to reliably imitate a human’s short-term memory to consistently pass a limited Turing Test (e.g. the same Eugene Goostman barely scraped by ~2014), as opposed to the constant forgetting of LSTMs and Markov Chains. Sure, the Turing Test isn’t particularly useful for AI benchmarks, but the market for advanced conversational agents could be a trillion-dollar business, and the average Joe is far more susceptible to the ELIZA Effect than we commonly idealize.