I think several of the subquestions that matter for whether it’ll plausibly work to have AI solve alignment for us are in the second category. Like the two points I mentioned in the post. I think there are other subquestions that are more in the first category, which are also relevant to the odds of success. I’m relatively low confidence about this kind of stuff because of all the normal reasons why it’s difficult to say how other people should be thinking. It’s easy to miss relevant priors, evidence, etc. But still… given what I know about what everyone believes, it looks like these questions should be resolvable among reasonable people.
I think several of the subquestions that matter for whether it’ll plausibly work to have AI solve alignment for us are in the second category. Like the two points I mentioned in the post. I think there are other subquestions that are more in the first category, which are also relevant to the odds of success. I’m relatively low confidence about this kind of stuff because of all the normal reasons why it’s difficult to say how other people should be thinking. It’s easy to miss relevant priors, evidence, etc. But still… given what I know about what everyone believes, it looks like these questions should be resolvable among reasonable people.