To be clear, I don’t envy the position of anyone who is trying to deploy AI systems and am not claiming anyone is making mistakes. I think they face a bunch of tricky decisions about how a model should behave, and those decisions are going to be subject to an incredible degree of scrutiny because they are relatively transparent (since anyone can run the model a bunch of times to characterize its behavior).
I’m just saying that how you feel about AI alignment shouldn’t be too closely tied up with how you end up feeling about those decisions. There are many applications of alignment like “not doubling down on lies” and “not murdering everyone” which should be extremely uncontroversial, and in general I think people ought to agree that it is better if customers and developers and regulators can choose the properties of AI systems rather than them being determined by technical contingencies of how AI is trained.
To be clear, I don’t envy the position of anyone who is trying to deploy AI systems and am not claiming anyone is making mistakes. I think they face a bunch of tricky decisions about how a model should behave, and those decisions are going to be subject to an incredible degree of scrutiny because they are relatively transparent (since anyone can run the model a bunch of times to characterize its behavior).
I’m just saying that how you feel about AI alignment shouldn’t be too closely tied up with how you end up feeling about those decisions. There are many applications of alignment like “not doubling down on lies” and “not murdering everyone” which should be extremely uncontroversial, and in general I think people ought to agree that it is better if customers and developers and regulators can choose the properties of AI systems rather than them being determined by technical contingencies of how AI is trained.