I appreciate the depth of this discussion and willingness to share! I hope for more great content -
This podcast reinforced something for me. I used to think that containing superintelligence, or controlling it, was largely a joke, and that there was little value in dumping resources into such approaches. A few hunches that I assigned high probability informed this belief. First, the upper bound on intelligence is very, very high. A simple argument for this: the gap between frog and human intelligence is massive. But the energy and compute difference is relatively small. While jumping many OOM’s in energy and flops doesn’t guarantee a similar jump beyond human intelligence, it feels like there’s no obvious hard upper bound. (And this doesn’t even account for future breakthroughs). Second, there are many, many possible escape vectors for an intelligence much smarter than humans. Third, controlling/preventing all these (and especially unforeseen) escape vectors is very hard, if not a fools errand. A simple argument for this – how likely is it that your pet frogs can keep you from doing something they don’t want, even if they band together? Essentially impossible. You have resources far beyond them, tactics they couldn’t hope to understand, and physical advantages to top it all off. At first blush, code running on a well-designed human-built data center couldn’t easily escape, but a little creativity and a lot of intelligence make escape pretty trivial.
What I recently realized, and this podcast reinforced, is that control still likely has value. Less so during later stages of takeoff, but more in the middle region. By the time we reach moderately superhuman capabilities, I suspect human-designed control of untested AI will be largely obsolete. Most of the complexity will just be beyond us. However, below this region we still have the chance to make a difference. Maybe, this is the region in time when the AI is slightly subhuman to the time when it’s slightly superhuman. And I think this is an especially important/vulnerable period in which to reduce risk.
This leads to a more general point. I suspect both human designed control and human designed alignment to have only small value when AI reaches vastly superhuman levels. Simply for the reasons stated above, the complexity and failure modes would just be beyond us. Hopefully, at that point our moderately well aligned and controlled AI’s are developing even stronger alignment and control mechanisms for intelligence beyond themselves. So, I think both efforts should largely focus on the slightly superhuman to subhuman regimes.
One worry I have about control is it overshadowing alignment work. In many ways misalignment is the core problem, and control is a bandage. A very valuable bandage, but a bandage none the less. A wound underneath a bandage could still kill you if even a single small infection escapes. But no wound at all won’t. Also, there is a utility perspective. An AI could be very well aligned but not controlled, and be very valuable. But an AI could be very badly aligned and well controlled and without delivering much value. Some control mechanisms demand a lot of resources and others severely limit the speed and capabilities of the underlying AI system.
I appreciate the depth of this discussion and willingness to share! I hope for more great content -
This podcast reinforced something for me. I used to think that containing superintelligence, or controlling it, was largely a joke, and that there was little value in dumping resources into such approaches. A few hunches that I assigned high probability informed this belief. First, the upper bound on intelligence is very, very high. A simple argument for this: the gap between frog and human intelligence is massive. But the energy and compute difference is relatively small. While jumping many OOM’s in energy and flops doesn’t guarantee a similar jump beyond human intelligence, it feels like there’s no obvious hard upper bound. (And this doesn’t even account for future breakthroughs). Second, there are many, many possible escape vectors for an intelligence much smarter than humans. Third, controlling/preventing all these (and especially unforeseen) escape vectors is very hard, if not a fools errand. A simple argument for this – how likely is it that your pet frogs can keep you from doing something they don’t want, even if they band together? Essentially impossible. You have resources far beyond them, tactics they couldn’t hope to understand, and physical advantages to top it all off. At first blush, code running on a well-designed human-built data center couldn’t easily escape, but a little creativity and a lot of intelligence make escape pretty trivial.
What I recently realized, and this podcast reinforced, is that control still likely has value. Less so during later stages of takeoff, but more in the middle region. By the time we reach moderately superhuman capabilities, I suspect human-designed control of untested AI will be largely obsolete. Most of the complexity will just be beyond us. However, below this region we still have the chance to make a difference. Maybe, this is the region in time when the AI is slightly subhuman to the time when it’s slightly superhuman. And I think this is an especially important/vulnerable period in which to reduce risk.
This leads to a more general point. I suspect both human designed control and human designed alignment to have only small value when AI reaches vastly superhuman levels. Simply for the reasons stated above, the complexity and failure modes would just be beyond us. Hopefully, at that point our moderately well aligned and controlled AI’s are developing even stronger alignment and control mechanisms for intelligence beyond themselves. So, I think both efforts should largely focus on the slightly superhuman to subhuman regimes.
One worry I have about control is it overshadowing alignment work. In many ways misalignment is the core problem, and control is a bandage. A very valuable bandage, but a bandage none the less. A wound underneath a bandage could still kill you if even a single small infection escapes. But no wound at all won’t. Also, there is a utility perspective. An AI could be very well aligned but not controlled, and be very valuable. But an AI could be very badly aligned and well controlled and without delivering much value. Some control mechanisms demand a lot of resources and others severely limit the speed and capabilities of the underlying AI system.