Thank you for writing this!
One argument for the “playbook” rather than the “plan” view is that there is a big difference between DISASTER (something very bad happening) and DOOM (irrecoverable extinction-level catastrophe). Consider the case of nuclear weapons. Arguably the disaster of Hiroshima and Nagasaki bombs led us to better arms control which helped so far prevent the catastrophe (even if not quite existential one) of an all-out nuclear war. In all but extremely fast take-off scenarios, we should see disasters as warning signs before doom.
The good thing is that avoiding disasters makes good business. In fact, I don’t expect AI labs to require any “altrusim” to focus their attention on alignment and safety. This survey by Timothy Lee on self-driving cars notes that after a single tragic incident in which an Uber self-driving car killed a pedestrian, “Uber’s self-driving division never really recovered from the crash, and Uber sold it off in 2020. The rest of the industry vowed not to repeat Uber’s mistake.” Given that a single disaster can be extremely hard to recover from, smart leaders of AI labs should focus on safety, even if it means being a little slower to the market.
While the initial push is to get AI to match human capabilities, as these tools become more than impressive demos and need to be deployed in the field, the customers will care much more about reliability and safety than they do about capabilities. If I am a software company using an AI system as a programmer, it’s more useful to me if it can reliably deliver bug-free 100-line subroutines than if it writes 10K sized programs that might contain subtle bugs. There is a reason why much of the programming infrastructure for real-world projects, including pull requests, code reviews, unit tests, is not aimed at getting something that kind of works out as quickly as possible, but rather make sure that the codebase grows in a reliable and maintainable fashion.
This doesn’t mean that the free market can take care of everything and that regulations are not needed to ensure that some companies don’t make a quick profit by deploying unsafe products and pushing off externalities to their users and the broader environment. (Indeed, some would say that this was done in the self-driving domain...) But I do think there is a big commercial incentive for AI labs to invest in research on how to ensure that systems pushed out behave in a predictable manner, and don’t start maximizing paperclips.
p.s. The nuclear setting also gives another lesson (TW: grim calculations follow). It is much more than a factor of two harder to extinguish 100% of the population than to kill the ~50% or so that live in large metropolitan areas. Generally, the difference between the effort needed to kill 50% of the population and the effort to kill a 1-p fraction should scale at least as 1/p.
There is a general phenomenon in tech that has been expressed many times of people over-estimating the short-term consequences and under-estimating the longer term ones (e.g., “Amara’s law”).
I think that often it is possible to see that current technology is on track to achieve X, where X is widely perceived as the main obstacle for the real-world application Y. But once you solve X, you discover that there is a myriad of other “smaller” problems Z_1 , Z_2 , Z_3 that you need to resolve before you can actually deploy it for Y.
And of course, there is always a huge gap between demonstrating you solved X on some clean academic benchmark, vs. needing to do so “in the wild”. This is particularly an issue in self-driving where errors can be literally deadly but arises in many other applications.
I do think that one lesson we can draw from self-driving is that there is a huge gap between full autonomy and “assistance” with human supervision. So, I would expect we would see AI be deployed as (increasingly sophisticated) “assistants’ way before AI systems actually are able to function as “drop-in” replacements for current human jobs. This is part of the point I was making here.