StanislavKrym comments on Taking the Training Wheels Off: Aligning LLMs without Personas

StanislavKrym 3 Jun 2026 6:59 UTC
2 points
0
Nitpick 1. The idea of animal welfare seems to be found at least in The Old Testament: “The righteous care for the needs of their animals, but the kindest acts of the wicked are cruel.”
Nitpick 2. If real-world humans make moral progress in ways aside from extrapolating values, then how could such ways be simulated and cause the AIs to make moral progress as well?
- Matthew Khoriaty 4 Jun 2026 3:20 UTC
  3 points
  2
  Parent
  I chose Beowulf because it is more alien and removed from the present day than the Bible. The Bible has had significant influence on our present-day values and culture, while Beowulf is still a human artifact containing human values, but extrapolating our current values from Beowulf would be very difficult. According to Claude, “Beowulf has nothing that reads as advocacy for animal rights or welfare, and the concept itself is anachronistic by roughly a millennium.”
  
  Your second point isn’t really a nitpick. Rather, it is the alignment problem itself. Nobody really knows how to solve it, but techniques such as inverse reinforcement learning or Building AIs that do human-like philosophy don’t run into the persona problem the same way as techniques like RLHF.