I beileve this is the only way to design an AI whose actions we still have confidence in the desirability of, even once the AI is out of our hands and is augmenting itself to unfathomable capabilities.
I think unleashing AI in approximately the present world, whose infrastructural and systemic vulnerabilities I gestured at here, in the “Dealing with unaligned competition” section (in short: no permeating trust systems that follow the money, unconstrained “reach-anywhere” internet architecture, information massively accumulated and centralised in the datacenters of few big corporations), would be reckless anyway, even if we believed we have the design you are talking about. Just the lingering uncertainty about our own conclusions about the soundness of this design and “defence in depth” thinking aka security mindset tells us that we should also prepare the global infrastructure and the global incentive system for the appearance of misaligned entities (or, at least try to prepare).
See also later here, where I hypothesise that such infrastructure and incentives shift won’t become feasible at least until the creation of an “alignment MVP”.
I believe it needs to get out of our hands and augment itself to unfathomable capabilities, in order for it to save the world.
Why? Contra position: “We don’t need AGI for an amazing future”. (I don’t say that I endorse it because I didn’t read it. I just point out that such a position exists.)
I’m somewhat contra the idea that there is a special “alignment problem” that remains glaringly unsolved. I tried to express it in the post “For alignment, we should simultaneously use multiple theories of cognition and value” and in this conversation with Ryan Kidd. Sure, there are a lot of engineering and sometimes scientific problems to solve, and the strategic landscape with many actors willing to develop AGI in open-source without any regard for alignment at all and releasing it into the world is very problematic. The properly secured global infrastructure and the right economic incentives and the right systems of governance are also not in place. But I would say that even cooking up a number of existing approaches, from cooperative RL and linguistic feedback to Active Inference and shard theory, could already work (again, conditioned on the fact that the right systemic incentives are instituted), without any new fundamental breakthroughs either in the science of intelligence, alignment, ML/DL, or game theory.
I think unleashing AI in approximately the present world, whose infrastructural and systemic vulnerabilities I gestured at here, in the “Dealing with unaligned competition” section (in short: no permeating trust systems that follow the money, unconstrained “reach-anywhere” internet architecture, information massively accumulated and centralised in the datacenters of few big corporations), would be reckless anyway, even if we believed we have the design you are talking about. Just the lingering uncertainty about our own conclusions about the soundness of this design and “defence in depth” thinking aka security mindset tells us that we should also prepare the global infrastructure and the global incentive system for the appearance of misaligned entities (or, at least try to prepare).
See also later here, where I hypothesise that such infrastructure and incentives shift won’t become feasible at least until the creation of an “alignment MVP”.
Why? Contra position: “We don’t need AGI for an amazing future”. (I don’t say that I endorse it because I didn’t read it. I just point out that such a position exists.)
I’m somewhat contra the idea that there is a special “alignment problem” that remains glaringly unsolved. I tried to express it in the post “For alignment, we should simultaneously use multiple theories of cognition and value” and in this conversation with Ryan Kidd. Sure, there are a lot of engineering and sometimes scientific problems to solve, and the strategic landscape with many actors willing to develop AGI in open-source without any regard for alignment at all and releasing it into the world is very problematic. The properly secured global infrastructure and the right economic incentives and the right systems of governance are also not in place. But I would say that even cooking up a number of existing approaches, from cooperative RL and linguistic feedback to Active Inference and shard theory, could already work (again, conditioned on the fact that the right systemic incentives are instituted), without any new fundamental breakthroughs either in the science of intelligence, alignment, ML/DL, or game theory.