This is pretty insightful, but I’m not sure the assumption that we would halt development if there were unsolved legible problems holds. The core issue might not be illegibility, but a risk-tolerance threshold in leadership that’s terrifyingly high.
Even if we legibly showed the powers that be that an AI had a 20% chance of catastrophic unsolved safety problems, I’d expect competitive pressure would lead them to deploy such a system anyway.
i notice the OP didn’t actually mention examples of legible or illegible alignment problems. saying “leaders would be unlikely to deploy an unaligned AGI if they saw it had legible problem X” sounds a lot like saying “we would never let AGI onto the open internet, we can just keep it in a box”, in the era before we deployed sydney soon as it caught the twinkle of a CEO’s eye.
I agree. I’ve been trying to discuss some terminology that I think might help, at least with discussing the situation. I think “AI” is generally an vague and confusing term and what we should actually be focused on are “Outcome Influencing Systems (OISs)”, where a hypothetical ASI would be an OIS capable of influencing what happens on Earth regardless of human preferences, however, humans are also OISs, as are groups of humans, and in fact the “competitive pressure” you mention is a kind of very powerful OIS that is already misaligned and in many ways superhuman.
Is it too late to “unplug” or “align” all of the powerful misaligned OIS operating in our world? I’m hoping not, but I think the framing might be valuable for examining the issue and maybe for avoiding some of the usual political issues involved in criticizing any specific powerful OIS that might happen to be influencing us towards potentially undesirable outcomes.
This is pretty insightful, but I’m not sure the assumption that we would halt development if there were unsolved legible problems holds. The core issue might not be illegibility, but a risk-tolerance threshold in leadership that’s terrifyingly high.
Even if we legibly showed the powers that be that an AI had a 20% chance of catastrophic unsolved safety problems, I’d expect competitive pressure would lead them to deploy such a system anyway.
i notice the OP didn’t actually mention examples of legible or illegible alignment problems. saying “leaders would be unlikely to deploy an unaligned AGI if they saw it had legible problem X” sounds a lot like saying “we would never let AGI onto the open internet, we can just keep it in a box”, in the era before we deployed sydney soon as it caught the twinkle of a CEO’s eye.
I agree. I’ve been trying to discuss some terminology that I think might help, at least with discussing the situation. I think “AI” is generally an vague and confusing term and what we should actually be focused on are “Outcome Influencing Systems (OISs)”, where a hypothetical ASI would be an OIS capable of influencing what happens on Earth regardless of human preferences, however, humans are also OISs, as are groups of humans, and in fact the “competitive pressure” you mention is a kind of very powerful OIS that is already misaligned and in many ways superhuman.
Is it too late to “unplug” or “align” all of the powerful misaligned OIS operating in our world? I’m hoping not, but I think the framing might be valuable for examining the issue and maybe for avoiding some of the usual political issues involved in criticizing any specific powerful OIS that might happen to be influencing us towards potentially undesirable outcomes.
What do you think?