RogerDearnaley comments on The Best Way to Align an LLM: Is Inner Alignment Now a Solved Problem?

RogerDearnaley 2 Jun 2025 9:38 UTC
3 points
3
We don’t need it to work in the infinite limit. (Personally, I’m assuming we’ll only be using this to align approximately-human-level research assistants to help us do AI-Assisted Alignment research — so at a level where if we failed, it might not be automatically disastrous.)