I’m not sure what you mean by “running on fewer resources”—it’s a 175 billion parameter model, so it has the same minimum hardware requirements as GPT-3.
Their inference setup likely uses the same hardware per-response, but I’d guess OpenAI has much faster kernels so Meta would need more hardware to serve a given traffic rate. However, that’s totally unrelated to the quality of each response.
Blenderbot is based on their largest “OPT” model, presumably fine-tuned somehow, after a training run which was explicitly imitating GPT-3 after they couldn’t work out better hyperparameters, though I’d add it’s pretty badly under-trained even according to pre-chinchilla scaling.
Except that it doesn’t really! The interpretations it stores in “memory” (which is just a small notepad) are so out-of-sync with what went on in the conversation previously that I think it may be doing worse with that memory than without it. As an anecdotal observation: I asked what it thought of Eliezer Yudkowsky’s take on AI ethics, and it responded asking me if I knew that [insert first sentence of wiki page about Yudkowsky here], and stored the “memory” that I enjoyed Harry Potter (presumably because of Yud’s fanfic?). It then proceeded to tell me that God is real and I should believe in Him, which assuming that Yudkowsky is not in fact God, had nothing to do with anything previously discussed.
It definitely is not usable outside of a research environment, which Meta should have known, but nonetheless their PR pitch has been promoting it as giving general consumers free access to a cutting-edge conversational chatbot (which it really really isn’t).
That sounds like it performing worse than because it tries to do the thing with memory. That sounds like an okay tradeoff if you want to have long-term success with having memory.
It definitely is not usable outside of a research environment, which Meta should have known, but nonetheless their PR pitch has been promoting it as giving general consumers free access to a cutting-edge conversational chatbot
I expect that you are wrong about that and that there are people who have fun chatting with it. Even if it’s a MVP that’s not very powerful there’s still a good chance that some people will find it useable.
To improve the product it’s important to have people using it and providing more data, so it makes sense to announce it in a way that gets more people to use it.
I’m not sure what you mean by “running on fewer resources”—it’s a 175 billion parameter model, so it has the same minimum hardware requirements as GPT-3.
Their inference setup likely uses the same hardware per-response, but I’d guess OpenAI has much faster kernels so Meta would need more hardware to serve a given traffic rate. However, that’s totally unrelated to the quality of each response.
Blenderbot is based on their largest “OPT” model, presumably fine-tuned somehow, after a training run which was explicitly imitating GPT-3 after they couldn’t work out better hyperparameters, though I’d add it’s pretty badly under-trained even according to pre-chinchilla scaling.
(all views my own, you know the drill)
It seems you are right. I thought previously that BlenderBot was supposed to be useable outside of the research enviroment.
I read through the description and a key difference seems to be that Blenderbot actually has long-term memory about the user with whom it interacts.
Except that it doesn’t really! The interpretations it stores in “memory” (which is just a small notepad) are so out-of-sync with what went on in the conversation previously that I think it may be doing worse with that memory than without it. As an anecdotal observation: I asked what it thought of Eliezer Yudkowsky’s take on AI ethics, and it responded asking me if I knew that [insert first sentence of wiki page about Yudkowsky here], and stored the “memory” that I enjoyed Harry Potter (presumably because of Yud’s fanfic?). It then proceeded to tell me that God is real and I should believe in Him, which assuming that Yudkowsky is not in fact God, had nothing to do with anything previously discussed. It definitely is not usable outside of a research environment, which Meta should have known, but nonetheless their PR pitch has been promoting it as giving general consumers free access to a cutting-edge conversational chatbot (which it really really isn’t).
That sounds like it performing worse than because it tries to do the thing with memory. That sounds like an okay tradeoff if you want to have long-term success with having memory.
I expect that you are wrong about that and that there are people who have fun chatting with it. Even if it’s a MVP that’s not very powerful there’s still a good chance that some people will find it useable.
To improve the product it’s important to have people using it and providing more data, so it makes sense to announce it in a way that gets more people to use it.