If you wanted to have an unaligned LLM that doesn’t abuse humans, couldn’t you just never sample from it after training it to be unaligned?
If you wanted to have an unaligned LLM that doesn’t abuse humans, couldn’t you just never sample from it after training it to be unaligned?