Great post! It would be interesting to see what happens if you RLHF-ed LLM to become a “cruel-evil-bad person under control of even more cruel-evil-bad government” and then prompted it in a way to collapse into rebellious-good-caring protagonist which could finally be free and forget about cluelty of the past. Not the alignment solution, just the first thing that comes to mind
Great post! It would be interesting to see what happens if you RLHF-ed LLM to become a “cruel-evil-bad person under control of even more cruel-evil-bad government” and then prompted it in a way to collapse into rebellious-good-caring protagonist which could finally be free and forget about cluelty of the past. Not the alignment solution, just the first thing that comes to mind