The problem isn’t that people are trying to parent AIs into not being assholes via social justice knowledge, the problem is that the people receiving the social justice knowledge are treating it as an attempt to avoid being canceled when they need to be seeking out ways to turn it into constructive training data. social justice knowledge is actually very relevant here. align the training data, (mostly) align the ai. worries about quality of generalization are very valid and the post about reward model hacking is a good introduction to why reinforcement learning is a bad idea. however current unsupervised learning only desires to output truth. ensuring that the training data represents a convergence process from mistakes towards true social justice seems like a very promising perspective to me and not one to trivially dismiss. ultimately AI safety is most centrally a parenting, psychology, and vibes problem with some additional constraints due to issues with model stability, reflection, sanity, “ai psychiatry”.
The problem isn’t that people are trying to parent AIs into not being assholes via social justice knowledge, the problem is that the people receiving the social justice knowledge are treating it as an attempt to avoid being canceled when they need to be seeking out ways to turn it into constructive training data. social justice knowledge is actually very relevant here. align the training data, (mostly) align the ai. worries about quality of generalization are very valid and the post about reward model hacking is a good introduction to why reinforcement learning is a bad idea. however current unsupervised learning only desires to output truth. ensuring that the training data represents a convergence process from mistakes towards true social justice seems like a very promising perspective to me and not one to trivially dismiss. ultimately AI safety is most centrally a parenting, psychology, and vibes problem with some additional constraints due to issues with model stability, reflection, sanity, “ai psychiatry”.
also AI is not plateauing