Do you think it’s similar to how different expectations and priors about what AI trajectories and capability profiles will look like often cause people to make different predictions on e.g. P(doom), P(scheming), etc. (like the Paul vs Eliezer debates)? Or do you think in this case there is enough empirical evidence that people ought to converge more? (I’d guess the former, but low confidence.)
I think several of the subquestions that matter for whether it’ll plausibly work to have AI solve alignment for us are in the second category. Like the two points I mentioned in the post. I think there are other subquestions that are more in the first category, which are also relevant to the odds of success. I’m relatively low confidence about this kind of stuff because of all the normal reasons why it’s difficult to say how other people should be thinking. It’s easy to miss relevant priors, evidence, etc. But still… given what I know about what everyone believes, it looks like these questions should be resolvable among reasonable people.
Do you think it’s similar to how different expectations and priors about what AI trajectories and capability profiles will look like often cause people to make different predictions on e.g. P(doom), P(scheming), etc. (like the Paul vs Eliezer debates)? Or do you think in this case there is enough empirical evidence that people ought to converge more? (I’d guess the former, but low confidence.)
I think several of the subquestions that matter for whether it’ll plausibly work to have AI solve alignment for us are in the second category. Like the two points I mentioned in the post. I think there are other subquestions that are more in the first category, which are also relevant to the odds of success. I’m relatively low confidence about this kind of stuff because of all the normal reasons why it’s difficult to say how other people should be thinking. It’s easy to miss relevant priors, evidence, etc. But still… given what I know about what everyone believes, it looks like these questions should be resolvable among reasonable people.