tailcalled comments on Stopping unaligned LLMs is easy!

tailcalled 3 Feb 2025 19:00 UTC
2 points
0
If you wanted to have an unaligned LLM that doesn’t abuse humans, couldn’t you just never sample from it after training it to be unaligned?