I think that has a fair bit of merit, but seems like a very different argument to the one they’re making, because they’re explicitly putting forth character as something orthogonal to alignment. In beginning of the essay they say
The core argument for the importance of AI character is that it will meaningfully impact:
(i) a range of challenges that arise even if we solve the technical alignment problem— like concentration of power, good moral reflection, risk of global catastrophe, and risk of global conflict
If they thought character was a way to help solve alignment, for example because it creates the seeds that determine the reflective fixed-point an agent ends up in after presumable intelligence amplification and lots of deliberation, that’s fine. But then they wouldn’t say that a core reason for focusing on character is the impacts it has even conditional on technical alignment being solved. Because technical alignment being solved screens off those effects.
I think that has a fair bit of merit, but seems like a very different argument to the one they’re making, because they’re explicitly putting forth character as something orthogonal to alignment. In beginning of the essay they say
If they thought character was a way to help solve alignment, for example because it creates the seeds that determine the reflective fixed-point an agent ends up in after presumable intelligence amplification and lots of deliberation, that’s fine. But then they wouldn’t say that a core reason for focusing on character is the impacts it has even conditional on technical alignment being solved. Because technical alignment being solved screens off those effects.