Daniel Tan comments on Daniel Tan’s Shortform

Daniel Tan 31 Jan 2025 15:35 UTC
2 points
0
What if the correct way to do safety post training is to train a different aligned model on top (the face) instead of directly trying to align the base model?