A lot of disagreement about what a solution to technical AGI safety looks like is really downstream of disagreements about questions like “How will AGI be built? What will it look like? How will it work?”
IMO, one more disagreement I see that is arguably central to the entire field is the question of how much iteration can help you.
At one extreme, OpenAI expects the entire alignment problem to be iterated away.
At another extreme, John Wentworth doesn’t expect many parts of the problem to be amenable to iteration.
To me the question “how much can iteration help you?” seems to have a big impact on “What’s the probability that we’ll ultimately succeed at alignment?” but has a much smaller (albeit nonzero) impact on “What technical safety research directions are more or less promising?”. Either way, we should come up with the best plan that we can come up with for how to make aligned AGI, right? Then, insofar as we can iterate on that plan based on meaningful test data, that’s awesome, lucky us, and we should definitely do that.
(“What’s the probability that we’ll succeed at alignment” is also an important question with real-world implications, e.g. on how bad it is to shorten timelines, but it’s not something I’m talking about in this particular post.)
IMO, one more disagreement I see that is arguably central to the entire field is the question of how much iteration can help you.
At one extreme, OpenAI expects the entire alignment problem to be iterated away.
At another extreme, John Wentworth doesn’t expect many parts of the problem to be amenable to iteration.
To me the question “how much can iteration help you?” seems to have a big impact on “What’s the probability that we’ll ultimately succeed at alignment?” but has a much smaller (albeit nonzero) impact on “What technical safety research directions are more or less promising?”. Either way, we should come up with the best plan that we can come up with for how to make aligned AGI, right? Then, insofar as we can iterate on that plan based on meaningful test data, that’s awesome, lucky us, and we should definitely do that.
(“What’s the probability that we’ll succeed at alignment” is also an important question with real-world implications, e.g. on how bad it is to shorten timelines, but it’s not something I’m talking about in this particular post.)