I agree with you that the “stereotyped image of AI catastrophe” is not what failure will most likely look like, and it’s great to see more discussion of alternative scenarios. But why exactly should we expect that the problems you describe will be exacerbated in a future with powerful AI, compared to the state of contemporary human societies? Humans also often optimise for what’s easy to measure, especially in organisations. Is the concern that current ML systems are unable to optimise hard-to-measure goals, or goals that are hard to represent in a computerised form? That is true but I think of this as a limitation of contemporary ML approaches rather than a fundamental property of advanced AI. With general intelligence, it should also be possible to optimise goals that are hard-to-measure.
Similarly, humans / companies / organisations regularly exhibit influence-seeking behaviour, and this can cause harm but it’s also usually possible to keep it in check to at least a certain degree.
So, while you point at things that can plausibly go wrong, I’d say that these are perennial issues that may become better or worse during and after the transition to advanced AI, and it’s hard to predict what will happen. Of course, this does not make a very appealing tale of doom – but maybe it would be best to dispense with tales of doom altogether.
I’m also not yet convinced that “these capture the most important dynamics of catastrophe.” Specifically, I think the following are also potentially serious issues: - Unfortunate circumstances in future cooperation problems between AI systems (and / or humans) result in widespread defection, leading to poor outcomes for everyone. - Conflicts between key future actors (AI or human) result in large quantities of disvalue (agential s-risks). - New technology leads to radical value drift of a form that we wouldn’t endorse.
But why exactly should we expect that the problems you describe will be exacerbated in a future with powerful AI, compared to the state of contemporary human societies?
To a large extent “ML” refers to a few particular technologies that have the form “try a bunch of things and do more of what works” or “consider a bunch of things and then do the one that is predicted to work.”
That is true but I think of this as a limitation of contemporary ML approaches rather than a fundamental property of advanced AI.
I’m mostly aiming to describe what I think is in fact most likely to go wrong, I agree it’s not a general or necessary feature of AI that its comparative advantage is optimizing easy-to-measure goals.
(I do think there is some real sense in which getting over this requires “solving alignment.”)
To a large extent “ML” refers to a few particular technologies that have the form “try a bunch of things and do more of what works” or “consider a bunch of things and then do the one that is predicted to work.”
Why not “try a bunch of measurements and figure out which one generalizes best” or “consider a bunch of things and then do the one that is predicted to work according to the broadest variety of ML-generated measurements”? (I expect there’s already some research corresponding to these suggestions, but more could be valuable?)
I agree with you that the “stereotyped image of AI catastrophe” is not what failure will most likely look like, and it’s great to see more discussion of alternative scenarios. But why exactly should we expect that the problems you describe will be exacerbated in a future with powerful AI, compared to the state of contemporary human societies? Humans also often optimise for what’s easy to measure, especially in organisations. Is the concern that current ML systems are unable to optimise hard-to-measure goals, or goals that are hard to represent in a computerised form? That is true but I think of this as a limitation of contemporary ML approaches rather than a fundamental property of advanced AI. With general intelligence, it should also be possible to optimise goals that are hard-to-measure.
Similarly, humans / companies / organisations regularly exhibit influence-seeking behaviour, and this can cause harm but it’s also usually possible to keep it in check to at least a certain degree.
So, while you point at things that can plausibly go wrong, I’d say that these are perennial issues that may become better or worse during and after the transition to advanced AI, and it’s hard to predict what will happen. Of course, this does not make a very appealing tale of doom – but maybe it would be best to dispense with tales of doom altogether.
I’m also not yet convinced that “these capture the most important dynamics of catastrophe.” Specifically, I think the following are also potentially serious issues:
- Unfortunate circumstances in future cooperation problems between AI systems (and / or humans) result in widespread defection, leading to poor outcomes for everyone.
- Conflicts between key future actors (AI or human) result in large quantities of disvalue (agential s-risks).
- New technology leads to radical value drift of a form that we wouldn’t endorse.
To a large extent “ML” refers to a few particular technologies that have the form “try a bunch of things and do more of what works” or “consider a bunch of things and then do the one that is predicted to work.”
I’m mostly aiming to describe what I think is in fact most likely to go wrong, I agree it’s not a general or necessary feature of AI that its comparative advantage is optimizing easy-to-measure goals.
(I do think there is some real sense in which getting over this requires “solving alignment.”)
Why not “try a bunch of measurements and figure out which one generalizes best” or “consider a bunch of things and then do the one that is predicted to work according to the broadest variety of ML-generated measurements”? (I expect there’s already some research corresponding to these suggestions, but more could be valuable?)