Yair Halberstadt comments on Stopping unaligned LLMs is easy!

Yair Halberstadt 3 Feb 2025 18:18 UTC
2 points
0
This isn’t a solution to aligned LLMs being abused by humans, but to unaligned LLMs abusing humans.
- tailcalled 3 Feb 2025 19:00 UTC
  2 points
  0
  Parent
  If you wanted to have an unaligned LLM that doesn’t abuse humans, couldn’t you just never sample from it after training it to be unaligned?