We in fact witness current AIs resisting changes to their goals, and so it appears to be the default in the current paradigm. However, it’s not clear whether or not some hypothetical other paradigm exists that doesn’t have this property (it’s definitely conceivable; I don’t know if that makes it likely, and it’s not obvious that this is something one would want to use as desiderata when concocting an alignment plan or not; depends on other details of the plan).
As far as is public record, no major lab is currently putting significant resources into pursuing a general AI paradigm sufficiently different from current-day LLMs that we’d expect it to obviate this failure mode.
In fairness, there is work happening to make LLMs less-prone to these kinds of issues, but that seems unlikely to me to hold in the superintelligence case.
We in fact witness current AIs resisting changes to their goals, and so it appears to be the default in the current paradigm. However, it’s not clear whether or not some hypothetical other paradigm exists that doesn’t have this property (it’s definitely conceivable; I don’t know if that makes it likely, and it’s not obvious that this is something one would want to use as desiderata when concocting an alignment plan or not; depends on other details of the plan).
As far as is public record, no major lab is currently putting significant resources into pursuing a general AI paradigm sufficiently different from current-day LLMs that we’d expect it to obviate this failure mode.
In fairness, there is work happening to make LLMs less-prone to these kinds of issues, but that seems unlikely to me to hold in the superintelligence case.