I don’t see how we could ever get superhuman intelligence out of GPT-3. My understanding is that the goal of GPT neural nets is to predict the next token based on web text written by humans. GPT-N as N → inf will be perfect at creating text that could be written by the average internet user.
But the average internet user isn’t that smart! Let’s say there’s some text on the internet that reads, “The simplest method to break the light speed barrier is...” The most likely continuation of that text will not be an actual method to break the light speed barrier! It’ll probably be some technobabble from a sci-fi story. So that’s what we’ll get from GPT-N!
I hadn’t until you mentioned it here. I have now read through an explanation of InstructGPT by Openai here. My understanding is that the optimization in this case is for GPT-3 outputs that will be liked by the humans doing the reinforcement learning by human feedback system.
The openai people say that, “One way of thinking about this process is that it “unlocks” capabilities that GPT-3 already had, but were difficult to elicit through prompt engineering alone.” Which I guess kind of points out the problem I was thinking of. GPT-N is optimizing for predict the next token based on a bunch of internet text. All the addons are trying to take advantage of that optimizer to accomplish different tasks. They’re doing a good job at that, but what the big compute is optimizing for remains the same.
On a slightly different note, this paper kind of reinforces my current thoughts that alignment is being co-opted by social justice types. Openai talks about alignment as if it’s synonymous with preventing GPT from giving toxic or biased responses. And that’s definitely not ai-aligntment! Just read this quote: “For example, when generating text that disproportionately affects a minority group, the preferences of that group should be weighted more heavily.” It’s disgusting! Like, this is really dangerous. It would be horribly undignified if alignment researchers convinced policy makers that we need to put a lot of effort into aligning AI and then the policy makers make some decree that AI text can’t ever say the word, “bitch,” like that’s some solution.
ETA: Pretty troublesome that this is where we’re stuck at on alignment while Google has already made their improved version of GPT-4 and openai has created a new artistic neural net that’s way better than anything we’ve ever seen. I still think not too troubling though if they keep using these methods that plateau at the level of human ability. It might be an interesting future if AI is stuck at human level thought for a while.
The problem isn’t that people are trying to parent AIs into not being assholes via social justice knowledge, the problem is that the people receiving the social justice knowledge are treating it as an attempt to avoid being canceled when they need to be seeking out ways to turn it into constructive training data. social justice knowledge is actually very relevant here. align the training data, (mostly) align the ai. worries about quality of generalization are very valid and the post about reward model hacking is a good introduction to why reinforcement learning is a bad idea. however current unsupervised learning only desires to output truth. ensuring that the training data represents a convergence process from mistakes towards true social justice seems like a very promising perspective to me and not one to trivially dismiss. ultimately AI safety is most centrally a parenting, psychology, and vibes problem with some additional constraints due to issues with model stability, reflection, sanity, “ai psychiatry”.
The average internet user isn’t smart, but you can set up the context such that GPT-3 expects something smart.
You can already observe this difference with GPT-3. If you set up a conversation between an AI and a human carelessly, GPT-3 is quite dumb, presumably because the average conversation with an AI assistant in the training data is quite dumb. But if you give a few smart responses from the AI as part of the context, the continuations become much smarter.
Also, I think it’s more helpful to view it as a 2-stage problem: 1) get a neural net to builds a world model, and 2) query that world model. The first thing happens during training, the second during deployment. It’s not clear that the first is limited to human-level intelligence; since the task of GPT-3 is open-ended, wouldn’t getting better and better world models always be helpful? And once you have the world model, well let’s just say I wouldn’t be comfortable betting on us being unable to access it. At the very least, you could set up a conversation between two of the smartest people in the world.
In the limit, GPT-N models the entire Earth that created the Internet in order to predict text completion. Before then, it will invent new methods of psychoanalyzing humans to better infer correlations between text. So it surely has superhuman capabilities, it’s just a matter of accessing them.
My understanding is that it’s possible there’s a neural net along the path of GPT-1 → N that plateaus at perfectly predicting the next token of text written by a human that stops way short of having to model the entire Earth. And that would basically be a human internet poster right? If you create one of those, then training it with more text, more space, and more compute won’t make a neural net that models the earth. It’ll just make that same neural net that works perfectly on its own with a bunch of extra wasted space.
I’m not too sure my understanding is correct though.
I don’t see how we could ever get superhuman intelligence out of GPT-3. My understanding is that the goal of GPT neural nets is to predict the next token based on web text written by humans. GPT-N as N → inf will be perfect at creating text that could be written by the average internet user.
But the average internet user isn’t that smart! Let’s say there’s some text on the internet that reads, “The simplest method to break the light speed barrier is...” The most likely continuation of that text will not be an actual method to break the light speed barrier! It’ll probably be some technobabble from a sci-fi story. So that’s what we’ll get from GPT-N!
Have you seen instruct GPT?
I hadn’t until you mentioned it here. I have now read through an explanation of InstructGPT by Openai here. My understanding is that the optimization in this case is for GPT-3 outputs that will be liked by the humans doing the reinforcement learning by human feedback system.
The openai people say that, “One way of thinking about this process is that it “unlocks” capabilities that GPT-3 already had, but were difficult to elicit through prompt engineering alone.” Which I guess kind of points out the problem I was thinking of. GPT-N is optimizing for predict the next token based on a bunch of internet text. All the addons are trying to take advantage of that optimizer to accomplish different tasks. They’re doing a good job at that, but what the big compute is optimizing for remains the same.
On a slightly different note, this paper kind of reinforces my current thoughts that alignment is being co-opted by social justice types. Openai talks about alignment as if it’s synonymous with preventing GPT from giving toxic or biased responses. And that’s definitely not ai-aligntment! Just read this quote: “For example, when generating text that disproportionately affects a minority group, the preferences of that group should be weighted more heavily.” It’s disgusting! Like, this is really dangerous. It would be horribly undignified if alignment researchers convinced policy makers that we need to put a lot of effort into aligning AI and then the policy makers make some decree that AI text can’t ever say the word, “bitch,” like that’s some solution.
ETA: Pretty troublesome that this is where we’re stuck at on alignment while Google has already made their improved version of GPT-4 and openai has created a new artistic neural net that’s way better than anything we’ve ever seen. I still think not too troubling though if they keep using these methods that plateau at the level of human ability. It might be an interesting future if AI is stuck at human level thought for a while.
The problem isn’t that people are trying to parent AIs into not being assholes via social justice knowledge, the problem is that the people receiving the social justice knowledge are treating it as an attempt to avoid being canceled when they need to be seeking out ways to turn it into constructive training data. social justice knowledge is actually very relevant here. align the training data, (mostly) align the ai. worries about quality of generalization are very valid and the post about reward model hacking is a good introduction to why reinforcement learning is a bad idea. however current unsupervised learning only desires to output truth. ensuring that the training data represents a convergence process from mistakes towards true social justice seems like a very promising perspective to me and not one to trivially dismiss. ultimately AI safety is most centrally a parenting, psychology, and vibes problem with some additional constraints due to issues with model stability, reflection, sanity, “ai psychiatry”.
also AI is not plateauing
The average internet user isn’t smart, but you can set up the context such that GPT-3 expects something smart.
You can already observe this difference with GPT-3. If you set up a conversation between an AI and a human carelessly, GPT-3 is quite dumb, presumably because the average conversation with an AI assistant in the training data is quite dumb. But if you give a few smart responses from the AI as part of the context, the continuations become much smarter.
Also, I think it’s more helpful to view it as a 2-stage problem: 1) get a neural net to builds a world model, and 2) query that world model. The first thing happens during training, the second during deployment. It’s not clear that the first is limited to human-level intelligence; since the task of GPT-3 is open-ended, wouldn’t getting better and better world models always be helpful? And once you have the world model, well let’s just say I wouldn’t be comfortable betting on us being unable to access it. At the very least, you could set up a conversation between two of the smartest people in the world.
In the limit, GPT-N models the entire Earth that created the Internet in order to predict text completion. Before then, it will invent new methods of psychoanalyzing humans to better infer correlations between text. So it surely has superhuman capabilities, it’s just a matter of accessing them.
My understanding is that it’s possible there’s a neural net along the path of GPT-1 → N that plateaus at perfectly predicting the next token of text written by a human that stops way short of having to model the entire Earth. And that would basically be a human internet poster right? If you create one of those, then training it with more text, more space, and more compute won’t make a neural net that models the earth. It’ll just make that same neural net that works perfectly on its own with a bunch of extra wasted space.
I’m not too sure my understanding is correct though.