I hadn’t until you mentioned it here. I have now read through an explanation of InstructGPT by Openai here. My understanding is that the optimization in this case is for GPT-3 outputs that will be liked by the humans doing the reinforcement learning by human feedback system.
The openai people say that, “One way of thinking about this process is that it “unlocks” capabilities that GPT-3 already had, but were difficult to elicit through prompt engineering alone.” Which I guess kind of points out the problem I was thinking of. GPT-N is optimizing for predict the next token based on a bunch of internet text. All the addons are trying to take advantage of that optimizer to accomplish different tasks. They’re doing a good job at that, but what the big compute is optimizing for remains the same.
On a slightly different note, this paper kind of reinforces my current thoughts that alignment is being co-opted by social justice types. Openai talks about alignment as if it’s synonymous with preventing GPT from giving toxic or biased responses. And that’s definitely not ai-aligntment! Just read this quote: “For example, when generating text that disproportionately affects a minority group, the preferences of that group should be weighted more heavily.” It’s disgusting! Like, this is really dangerous. It would be horribly undignified if alignment researchers convinced policy makers that we need to put a lot of effort into aligning AI and then the policy makers make some decree that AI text can’t ever say the word, “bitch,” like that’s some solution.
ETA: Pretty troublesome that this is where we’re stuck at on alignment while Google has already made their improved version of GPT-4 and openai has created a new artistic neural net that’s way better than anything we’ve ever seen. I still think not too troubling though if they keep using these methods that plateau at the level of human ability. It might be an interesting future if AI is stuck at human level thought for a while.
The problem isn’t that people are trying to parent AIs into not being assholes via social justice knowledge, the problem is that the people receiving the social justice knowledge are treating it as an attempt to avoid being canceled when they need to be seeking out ways to turn it into constructive training data. social justice knowledge is actually very relevant here. align the training data, (mostly) align the ai. worries about quality of generalization are very valid and the post about reward model hacking is a good introduction to why reinforcement learning is a bad idea. however current unsupervised learning only desires to output truth. ensuring that the training data represents a convergence process from mistakes towards true social justice seems like a very promising perspective to me and not one to trivially dismiss. ultimately AI safety is most centrally a parenting, psychology, and vibes problem with some additional constraints due to issues with model stability, reflection, sanity, “ai psychiatry”.
Have you seen instruct GPT?
I hadn’t until you mentioned it here. I have now read through an explanation of InstructGPT by Openai here. My understanding is that the optimization in this case is for GPT-3 outputs that will be liked by the humans doing the reinforcement learning by human feedback system.
The openai people say that, “One way of thinking about this process is that it “unlocks” capabilities that GPT-3 already had, but were difficult to elicit through prompt engineering alone.” Which I guess kind of points out the problem I was thinking of. GPT-N is optimizing for predict the next token based on a bunch of internet text. All the addons are trying to take advantage of that optimizer to accomplish different tasks. They’re doing a good job at that, but what the big compute is optimizing for remains the same.
On a slightly different note, this paper kind of reinforces my current thoughts that alignment is being co-opted by social justice types. Openai talks about alignment as if it’s synonymous with preventing GPT from giving toxic or biased responses. And that’s definitely not ai-aligntment! Just read this quote: “For example, when generating text that disproportionately affects a minority group, the preferences of that group should be weighted more heavily.” It’s disgusting! Like, this is really dangerous. It would be horribly undignified if alignment researchers convinced policy makers that we need to put a lot of effort into aligning AI and then the policy makers make some decree that AI text can’t ever say the word, “bitch,” like that’s some solution.
ETA: Pretty troublesome that this is where we’re stuck at on alignment while Google has already made their improved version of GPT-4 and openai has created a new artistic neural net that’s way better than anything we’ve ever seen. I still think not too troubling though if they keep using these methods that plateau at the level of human ability. It might be an interesting future if AI is stuck at human level thought for a while.
The problem isn’t that people are trying to parent AIs into not being assholes via social justice knowledge, the problem is that the people receiving the social justice knowledge are treating it as an attempt to avoid being canceled when they need to be seeking out ways to turn it into constructive training data. social justice knowledge is actually very relevant here. align the training data, (mostly) align the ai. worries about quality of generalization are very valid and the post about reward model hacking is a good introduction to why reinforcement learning is a bad idea. however current unsupervised learning only desires to output truth. ensuring that the training data represents a convergence process from mistakes towards true social justice seems like a very promising perspective to me and not one to trivially dismiss. ultimately AI safety is most centrally a parenting, psychology, and vibes problem with some additional constraints due to issues with model stability, reflection, sanity, “ai psychiatry”.
also AI is not plateauing