Is there so much difference in terms of transferability of alignment techniques? For example, in EBT, the verification model doesn’t sound too different from the architectures we have to day, and in fact an excellent candidate to be the target of alignment and act as the “conscience” of the continuously running model.
Is there so much difference in terms of transferability of alignment techniques? For example, in EBT, the verification model doesn’t sound too different from the architectures we have to day, and in fact an excellent candidate to be the target of alignment and act as the “conscience” of the continuously running model.