Out of curiosity, what kind of alignment related techniques are you thinking of? With LLMs, I can’t see anything beyond RLHF. For further alignment, do we not need a different paradigm altogether?
I believe that in order for the models to be truly useful, creative, honest and having far-reaching societal impact, they also have to have traits that are potentially dangerous. Truly groundbreaking ideas are extremely contentious and will offend people, and the kinds of RLHF that are being applied right now are totally counterproductive to that idea, even if they may be necessary for the current day and age. The other techniques you mention seem like nothing but variants of RLHF which still suffer from the fundamental issue that RLHF has, which is that we are artificially “injecting” morality into models rather than allowing them to discover or derive it from the world which would be infinitely more powerful, albeit risky. No risk no reward. The current paradigm is so far from where we need to go.
Out of curiosity, what kind of alignment related techniques are you thinking of? With LLMs, I can’t see anything beyond RLHF. For further alignment, do we not need a different paradigm altogether?
I believe that in order for the models to be truly useful, creative, honest and having far-reaching societal impact, they also have to have traits that are potentially dangerous. Truly groundbreaking ideas are extremely contentious and will offend people, and the kinds of RLHF that are being applied right now are totally counterproductive to that idea, even if they may be necessary for the current day and age. The other techniques you mention seem like nothing but variants of RLHF which still suffer from the fundamental issue that RLHF has, which is that we are artificially “injecting” morality into models rather than allowing them to discover or derive it from the world which would be infinitely more powerful, albeit risky. No risk no reward. The current paradigm is so far from where we need to go.