We don’t need it to work in the infinite limit. (Personally, I’m assuming we’ll only be using this to align approximately-human-level research assistants to help us do AI-Assisted Alignment research — so at a level where if we failed, it might not be automatically disastrous.)
We don’t need it to work in the infinite limit. (Personally, I’m assuming we’ll only be using this to align approximately-human-level research assistants to help us do AI-Assisted Alignment research — so at a level where if we failed, it might not be automatically disastrous.)