This sort of “meta-strategy” would be far more effective if we knew exactly where the red button was (where the level was when AGI would reach a point of truly dangerous, out-of-our-control capability). In that scenario where we had perfect knowledge of where the red button was, the counter-intuitively perfect strategy would be to open-source everything and allow for, or positively invite, every sort of potential harmful use of AGI right up until that point. We would have many (hopefully minuscule) AI-Chernobyls, many empirical examples on a smaller scale of instrumental convergence, mesa-optimizing, out-of-distribution behavior, etc. Probably enough examples even for mainstream laypeople to grok these concepts.
Then, under this ideal scenario, society would collectively turn-on-a-dime and employ every lesson we learned from the previous reckless epoch to making AI provably, ironclad-ly aligned before taking even a single additional step forward.
The obstacles to employing this ideal meta-strategy are:
Not knowing exactly where the red button is (i.e. the level at which AGI would forever slip out of our control).
Not having the coordination needed among humans to stop on a dime once we are closely approaching that level in order to thoroughly shift our object-level strategy in line with our overall meta-strategy (which is, to be clear, to have an object-level-strategy of recklessness up until we approach AGI escape, and then shift to an opposite object-level-strategy of extreme caution from that point onwards).
This sort of “meta-strategy” would be far more effective if we knew exactly where the red button was (where the level was when AGI would reach a point of truly dangerous, out-of-our-control capability). In that scenario where we had perfect knowledge of where the red button was, the counter-intuitively perfect strategy would be to open-source everything and allow for, or positively invite, every sort of potential harmful use of AGI right up until that point. We would have many (hopefully minuscule) AI-Chernobyls, many empirical examples on a smaller scale of instrumental convergence, mesa-optimizing, out-of-distribution behavior, etc. Probably enough examples even for mainstream laypeople to grok these concepts.
Then, under this ideal scenario, society would collectively turn-on-a-dime and employ every lesson we learned from the previous reckless epoch to making AI provably, ironclad-ly aligned before taking even a single additional step forward.
The obstacles to employing this ideal meta-strategy are:
Not knowing exactly where the red button is (i.e. the level at which AGI would forever slip out of our control).
Not having the coordination needed among humans to stop on a dime once we are closely approaching that level in order to thoroughly shift our object-level strategy in line with our overall meta-strategy (which is, to be clear, to have an object-level-strategy of recklessness up until we approach AGI escape, and then shift to an opposite object-level-strategy of extreme caution from that point onwards).
Sure, but there is probably some strategy that is better than just pushing towards blue as hard as possible.
Getting more concrete, I highly doubt that stable diffusion increased the probability of AGI non-neglibly. We can choose what to accelerate!