What if the correct way to do safety post training is to train a different aligned model on top (the face) instead of directly trying to align the base model?
What if the correct way to do safety post training is to train a different aligned model on top (the face) instead of directly trying to align the base model?