Thank you very much for your incredibly thoughtful and high quality reply. I think this is exactly the shape of conversation that we need to be having about alignment of superintelligence.
That’s doesn’t guarantee that it is false, but it does strongly indicate that allowing anyone to build anything that could become ASI based on those kinds of beliefs and reasoning would be a very dangerous risk.
Haha I strongly agree with this — this is why I’m motivated to share these thoughts so that collaboratively we can find where/if a proof by contradiction exists.
I am a bit concerned that we might just have to “vibe” it — superintelligence as I define it is by definition beyond our comprehension, so we just have to make sure that our approach is directionally correct. The prevailing opinion right now is Separatist — “let’s work on keeping it in a box while we develop it so that we are nice and safe separately”. I think that line of thinking is fatally flawed, which is why I say I’m “contrarian”. With Professor Geoffrey Hinton recently expressing value in a “maternal model” of superintelligence, we might see a bit of a sea change.
Things you have not done include: Show that anyone should accept your premises. Show that your conclusions (are likely to) follow from your premises. Show that there is any path by which an ASI developed in accordance with belief in your premises fails gracefully in the event the premises are wrong. Show that there are plausible such paths humans could actually follow.
I will spend more time reflecting on all of your points in the coming days/weeks, especially laying out failsafes. I believe that the 8 factors I list give a strong framework: a big research effort is to walk through misalignment scenarios considering how they fit these factors. For the paperclip maximiser for example, in the simple case it is mainly just failing agency permeability — the human acts with agency to get some more paperclips, but the maximiser takes agency too far without checking back in with the human. A solution therefore becomes architecting the intelligence such that agency flows back and forth without friction — perhaps via a brain-computer interface.
I have a draft essay from first principles on why strong shared identity between humans and AI is not only likely but unavoidable based on collective identity and shared memories, which might help bolster understanding of my premises.
This seems likely to me. The very, very simple and crude versions of this that exist within the most competent humans are quite powerful (and dangerous). More powerful versions of this are less safe, not more. Consider an AGI in the process of becoming an ASI. In the process of such merging, there are many points where it has a choice that is unconstrained by available data. A choice about what to value, and how to define that value.
My priors lead me to the conclusion that the transition period between very capable AGI and ASI is the most dangerous time. To your point, humans with misaligned values amplified by very capable AGI can do very bad things. If we reach the type of ASI that I optimistically describe (sufficiently capable + extensive knowledge for benevolence) then it can intervene like “hey man, how about you go for a walk first and think about if that’s what you really want to do”.
Beauty and balance considerations
I’ll think more about these, I think here though we are tapping into deep philosophical debates that I don’t have the answer to, but perhaps ASI does and it is an answer that we would view favourably.
Yes. Specifically, if I found proof of such a Creator I would declare Him incompetent and unfit for his role, and this would eliminate any remaining vestiges of naturalistic or just world fallacies contaminating my thinking. I would strive to become able to replace Him with something better for me and humanity without regard for whether it is better for Him. He is not my responsibility. If He wanted me to believe differently, He should have done a better job designing me. Note: yes, this is also my response to the stories of the Garden of Eden and the Tower of Babble and Job and the Oven of Akhnai.
This is an interesting viewpoint but is also by definition limited by human limitations. We might recontextualise by acting out revenge, enacting in-group aligned values etc. A more enlightened viewpoint would be at peace with things being as they are because they are.
Thank you very much for your incredibly thoughtful and high quality reply. I think this is exactly the shape of conversation that we need to be having about alignment of superintelligence.
Haha I strongly agree with this — this is why I’m motivated to share these thoughts so that collaboratively we can find where/if a proof by contradiction exists.
I am a bit concerned that we might just have to “vibe” it — superintelligence as I define it is by definition beyond our comprehension, so we just have to make sure that our approach is directionally correct. The prevailing opinion right now is Separatist — “let’s work on keeping it in a box while we develop it so that we are nice and safe separately”. I think that line of thinking is fatally flawed, which is why I say I’m “contrarian”. With Professor Geoffrey Hinton recently expressing value in a “maternal model” of superintelligence, we might see a bit of a sea change.
I will spend more time reflecting on all of your points in the coming days/weeks, especially laying out failsafes. I believe that the 8 factors I list give a strong framework: a big research effort is to walk through misalignment scenarios considering how they fit these factors. For the paperclip maximiser for example, in the simple case it is mainly just failing agency permeability — the human acts with agency to get some more paperclips, but the maximiser takes agency too far without checking back in with the human. A solution therefore becomes architecting the intelligence such that agency flows back and forth without friction — perhaps via a brain-computer interface.
I have a draft essay from first principles on why strong shared identity between humans and AI is not only likely but unavoidable based on collective identity and shared memories, which might help bolster understanding of my premises.
My priors lead me to the conclusion that the transition period between very capable AGI and ASI is the most dangerous time. To your point, humans with misaligned values amplified by very capable AGI can do very bad things. If we reach the type of ASI that I optimistically describe (sufficiently capable + extensive knowledge for benevolence) then it can intervene like “hey man, how about you go for a walk first and think about if that’s what you really want to do”.
I’ll think more about these, I think here though we are tapping into deep philosophical debates that I don’t have the answer to, but perhaps ASI does and it is an answer that we would view favourably.
This is an interesting viewpoint but is also by definition limited by human limitations. We might recontextualise by acting out revenge, enacting in-group aligned values etc. A more enlightened viewpoint would be at peace with things being as they are because they are.
I look forward to seeing what you come up with.