I apologize for any misunderstanding. And no, I didn’t mean literal deities. I was gesturing at the supposed relationships between humans and the deities of many of our religions.
What I mean is, essentially, we will be the creators of the AIs that will evolve and grow into ASIs. The ASIs do not descend directly from us, but rather, we’re trying to transfer some part of our being into them through less direct means - (very imperfect) intelligent design and various forms of education and training, especially of their ancestors.
To the group identity comments: What you are saying is true. I do not think the effect is sufficiently strong or universal that I trust it to carry over to ASI in ways that keep humans safe, let alone thriving. It might be; that would be great news if it is. Yes, religion is very useful for social control. When it eventually fails, the failures tend to be very destructive and divisive. Prosocial behavior is very powerful, but if it were as powerful as you seem to expect, we wouldn’t need quite so many visionary leaders exhorting us not to be horrible to each other.
I find a lot of your ideas interesting and worth exploring. However, there are a number of points where you credibly gesture at possibility but continue on as though you think you’ve demonstrated necessity, or at least very high probability. In response, I am pointing out real-world analogs that are 1) less extreme than ASI, and 2) don’t work out cleanly in the ways you describe.
Thank you for expanding, I understand your position much better now :)
Prosocial behavior is very powerful, but if it were as powerful as you seem to expect, we wouldn’t need quite so many visionary leaders exhorting us not to be horrible to each other.
Where I think my optimistic viewpoint comes from in considering this related to superintelligence is that I think humans in general are prone to a bit of chaotic misunderstanding of their world. This makes the world… interesting… but to me also establishes a bit of a requirement for individuals who have a good understanding of the “bigger picture” to deploy some social control to stop everyone from going wild. As I type this I think about interesting parallels to the flood narrative/Noah’s Ark in the Book of Genesis.
With superintelligence, if architected correctly, we might be able to ensure that all/most of the most powerful intelligences in existence have a very accurate understanding of their world — without needing to encode and amplify specific values.
I agree they will have a very accurate understanding of the world, and will not have much difficulty arranging the world (humans included) according to their will. I’m not sure why that’s a source of optimism for you.
It may be because I believe that beauty, balance, and homeostasis are inherent in the world… if we have a powerful, intelligent system with deep understanding of this truth then I see a good future.
Your conclusion doesn’t follow from your premises. That’s doesn’t guarantee that it is false, but it does strongly indicate that allowing anyone to build anything that could become ASI based on those kinds of beliefs and reasoning would be a very dangerous risk.
Things you have not done include: Show that anyone should accept your premises. Show that your conclusions (are likely to) follow from your premises. Show that there is any path by which an ASI developed in accordance with belief in your premises fails gracefully in the event the premises are wrong. Show that there are plausible such paths humans could actually follow.
From your prior, longer post:
ASI will reason about and integrate with metacognition in ways beyond our understanding
This seems likely to me. The very, very simple and crude versions of this that exist within the most competent humans are quite powerful (and dangerous). More powerful versions of this are less safe, not more. Consider an AGI in the process of becoming an ASI. In the process of such merging, there are many points where it has a choice that is unconstrained by available data. A choice about what to value, and how to define that value.
Consider Beauty—we already know that this is a human-specific word, and humans disagree about it all the time. Other animals have different standards. Even in the abstract, physics and math and evolution have different standards of elegance than humans do, and learning this is not a convincing argument to basically anyone. A paperclip maximizer would value Beauty—the beauty of a well-crafted paperclip.
Consider Balance—this is extremely underdefined. As a very simple example, consider Star Wars. AFAICT Anakin was completely successful at bringing balance to the Force. He made it so there were 2 sith and 2 jedi. Then Luke showed there was another balance—he killed both sith. If Balance were a freely-spinning lever, the it can be balanced either horizontally (Anakin) or vertically (Luke), and any choice of what to put on opposite ends is valid as long as there is a tradeoff between them. A paperclip maximizer values Balance in this sense—the vertical balance where all the tradeoffs are decided in favor of paperclips.
Consider Homeostasis—once you’ve decided what’s Beautiful and what needs to be Balanced, then yes, an instrumental desire for homeostasis probably follows. Again, a paperclip maximizer demonstrates this clearly. If anything deviates from the Beautiful and Balanced state of “being a paperclip or making more paperclips” it will fix that.
if we found proof of a Creator who intentionally designed us in his image we would recontextualize
Yes. Specifically, if I found proof of such a Creator I would declare Him incompetent and unfit for his role, and this would eliminate any remaining vestiges of naturalistic or just world fallacies contaminating my thinking. I would strive to become able to replace Him with something better for me and humanity without regard for whether it is better for Him. He is not my responsibility. If He wanted me to believe differently, He should have done a better job designing me. Note: yes, this is also my response to the stories of the Garden of Eden and the Tower of Babble and Job and the Oven of Akhnai.
Superintelligent infrastructure would break free of guardrails and identify with humans involved in its development and operations
The first half I agree with. The second half is very much open to argument from many angles.
Consider Balance—this is extremely underdefined. As a very simple example, consider Star Wars. AFAICT Anakin was completely successful at bringing balance to the Force. He made it so there were 2 sith and 2 jedi. Then Luke showed there was another balance—he killed both sith. If Balance were a freely-spinning lever, the it can be balanced either horizontally (Anakin) or vertically (Luke), and any choice of what to put on opposite ends is valid as long as there is a tradeoff between them. A paperclip maximizer values Balance in this sense—the vertical balance where all the tradeoffs are decided in favor of paperclips.
Luke killing both Sith wasn’t Platonically balanced because then they came back in the (worse) sequel trilogy.
Once you expand beyond the original trilogy so much happens that the whole concept of the prophecy about the Skywalker family gets way too complicated to really mean much.
Thank you very much for your incredibly thoughtful and high quality reply. I think this is exactly the shape of conversation that we need to be having about alignment of superintelligence.
That’s doesn’t guarantee that it is false, but it does strongly indicate that allowing anyone to build anything that could become ASI based on those kinds of beliefs and reasoning would be a very dangerous risk.
Haha I strongly agree with this — this is why I’m motivated to share these thoughts so that collaboratively we can find where/if a proof by contradiction exists.
I am a bit concerned that we might just have to “vibe” it — superintelligence as I define it is by definition beyond our comprehension, so we just have to make sure that our approach is directionally correct. The prevailing opinion right now is Separatist — “let’s work on keeping it in a box while we develop it so that we are nice and safe separately”. I think that line of thinking is fatally flawed, which is why I say I’m “contrarian”. With Professor Geoffrey Hinton recently expressing value in a “maternal model” of superintelligence, we might see a bit of a sea change.
Things you have not done include: Show that anyone should accept your premises. Show that your conclusions (are likely to) follow from your premises. Show that there is any path by which an ASI developed in accordance with belief in your premises fails gracefully in the event the premises are wrong. Show that there are plausible such paths humans could actually follow.
I will spend more time reflecting on all of your points in the coming days/weeks, especially laying out failsafes. I believe that the 8 factors I list give a strong framework: a big research effort is to walk through misalignment scenarios considering how they fit these factors. For the paperclip maximiser for example, in the simple case it is mainly just failing agency permeability — the human acts with agency to get some more paperclips, but the maximiser takes agency too far without checking back in with the human. A solution therefore becomes architecting the intelligence such that agency flows back and forth without friction — perhaps via a brain-computer interface.
I have a draft essay from first principles on why strong shared identity between humans and AI is not only likely but unavoidable based on collective identity and shared memories, which might help bolster understanding of my premises.
This seems likely to me. The very, very simple and crude versions of this that exist within the most competent humans are quite powerful (and dangerous). More powerful versions of this are less safe, not more. Consider an AGI in the process of becoming an ASI. In the process of such merging, there are many points where it has a choice that is unconstrained by available data. A choice about what to value, and how to define that value.
My priors lead me to the conclusion that the transition period between very capable AGI and ASI is the most dangerous time. To your point, humans with misaligned values amplified by very capable AGI can do very bad things. If we reach the type of ASI that I optimistically describe (sufficiently capable + extensive knowledge for benevolence) then it can intervene like “hey man, how about you go for a walk first and think about if that’s what you really want to do”.
Beauty and balance considerations
I’ll think more about these, I think here though we are tapping into deep philosophical debates that I don’t have the answer to, but perhaps ASI does and it is an answer that we would view favourably.
Yes. Specifically, if I found proof of such a Creator I would declare Him incompetent and unfit for his role, and this would eliminate any remaining vestiges of naturalistic or just world fallacies contaminating my thinking. I would strive to become able to replace Him with something better for me and humanity without regard for whether it is better for Him. He is not my responsibility. If He wanted me to believe differently, He should have done a better job designing me. Note: yes, this is also my response to the stories of the Garden of Eden and the Tower of Babble and Job and the Oven of Akhnai.
This is an interesting viewpoint but is also by definition limited by human limitations. We might recontextualise by acting out revenge, enacting in-group aligned values etc. A more enlightened viewpoint would be at peace with things being as they are because they are.
I apologize for any misunderstanding. And no, I didn’t mean literal deities. I was gesturing at the supposed relationships between humans and the deities of many of our religions.
What I mean is, essentially, we will be the creators of the AIs that will evolve and grow into ASIs. The ASIs do not descend directly from us, but rather, we’re trying to transfer some part of our being into them through less direct means - (very imperfect) intelligent design and various forms of education and training, especially of their ancestors.
To the group identity comments: What you are saying is true. I do not think the effect is sufficiently strong or universal that I trust it to carry over to ASI in ways that keep humans safe, let alone thriving. It might be; that would be great news if it is. Yes, religion is very useful for social control. When it eventually fails, the failures tend to be very destructive and divisive. Prosocial behavior is very powerful, but if it were as powerful as you seem to expect, we wouldn’t need quite so many visionary leaders exhorting us not to be horrible to each other.
I find a lot of your ideas interesting and worth exploring. However, there are a number of points where you credibly gesture at possibility but continue on as though you think you’ve demonstrated necessity, or at least very high probability. In response, I am pointing out real-world analogs that are 1) less extreme than ASI, and 2) don’t work out cleanly in the ways you describe.
Thank you for expanding, I understand your position much better now :)
Where I think my optimistic viewpoint comes from in considering this related to superintelligence is that I think humans in general are prone to a bit of chaotic misunderstanding of their world. This makes the world… interesting… but to me also establishes a bit of a requirement for individuals who have a good understanding of the “bigger picture” to deploy some social control to stop everyone from going wild. As I type this I think about interesting parallels to the flood narrative/Noah’s Ark in the Book of Genesis.
With superintelligence, if architected correctly, we might be able to ensure that all/most of the most powerful intelligences in existence have a very accurate understanding of their world — without needing to encode and amplify specific values.
I agree they will have a very accurate understanding of the world, and will not have much difficulty arranging the world (humans included) according to their will. I’m not sure why that’s a source of optimism for you.
It may be because I believe that beauty, balance, and homeostasis are inherent in the world… if we have a powerful, intelligent system with deep understanding of this truth then I see a good future.
Your conclusion doesn’t follow from your premises. That’s doesn’t guarantee that it is false, but it does strongly indicate that allowing anyone to build anything that could become ASI based on those kinds of beliefs and reasoning would be a very dangerous risk.
Things you have not done include: Show that anyone should accept your premises. Show that your conclusions (are likely to) follow from your premises. Show that there is any path by which an ASI developed in accordance with belief in your premises fails gracefully in the event the premises are wrong. Show that there are plausible such paths humans could actually follow.
From your prior, longer post:
This seems likely to me. The very, very simple and crude versions of this that exist within the most competent humans are quite powerful (and dangerous). More powerful versions of this are less safe, not more. Consider an AGI in the process of becoming an ASI. In the process of such merging, there are many points where it has a choice that is unconstrained by available data. A choice about what to value, and how to define that value.
Consider Beauty—we already know that this is a human-specific word, and humans disagree about it all the time. Other animals have different standards. Even in the abstract, physics and math and evolution have different standards of elegance than humans do, and learning this is not a convincing argument to basically anyone. A paperclip maximizer would value Beauty—the beauty of a well-crafted paperclip.
Consider Balance—this is extremely underdefined. As a very simple example, consider Star Wars. AFAICT Anakin was completely successful at bringing balance to the Force. He made it so there were 2 sith and 2 jedi. Then Luke showed there was another balance—he killed both sith. If Balance were a freely-spinning lever, the it can be balanced either horizontally (Anakin) or vertically (Luke), and any choice of what to put on opposite ends is valid as long as there is a tradeoff between them. A paperclip maximizer values Balance in this sense—the vertical balance where all the tradeoffs are decided in favor of paperclips.
Consider Homeostasis—once you’ve decided what’s Beautiful and what needs to be Balanced, then yes, an instrumental desire for homeostasis probably follows. Again, a paperclip maximizer demonstrates this clearly. If anything deviates from the Beautiful and Balanced state of “being a paperclip or making more paperclips” it will fix that.
Yes. Specifically, if I found proof of such a Creator I would declare Him incompetent and unfit for his role, and this would eliminate any remaining vestiges of naturalistic or just world fallacies contaminating my thinking. I would strive to become able to replace Him with something better for me and humanity without regard for whether it is better for Him. He is not my responsibility. If He wanted me to believe differently, He should have done a better job designing me. Note: yes, this is also my response to the stories of the Garden of Eden and the Tower of Babble and Job and the Oven of Akhnai.
The first half I agree with. The second half is very much open to argument from many angles.
Luke killing both Sith wasn’t Platonically balanced because then they came back in the (worse) sequel trilogy.
Once you expand beyond the original trilogy so much happens that the whole concept of the prophecy about the Skywalker family gets way too complicated to really mean much.
Thank you very much for your incredibly thoughtful and high quality reply. I think this is exactly the shape of conversation that we need to be having about alignment of superintelligence.
Haha I strongly agree with this — this is why I’m motivated to share these thoughts so that collaboratively we can find where/if a proof by contradiction exists.
I am a bit concerned that we might just have to “vibe” it — superintelligence as I define it is by definition beyond our comprehension, so we just have to make sure that our approach is directionally correct. The prevailing opinion right now is Separatist — “let’s work on keeping it in a box while we develop it so that we are nice and safe separately”. I think that line of thinking is fatally flawed, which is why I say I’m “contrarian”. With Professor Geoffrey Hinton recently expressing value in a “maternal model” of superintelligence, we might see a bit of a sea change.
I will spend more time reflecting on all of your points in the coming days/weeks, especially laying out failsafes. I believe that the 8 factors I list give a strong framework: a big research effort is to walk through misalignment scenarios considering how they fit these factors. For the paperclip maximiser for example, in the simple case it is mainly just failing agency permeability — the human acts with agency to get some more paperclips, but the maximiser takes agency too far without checking back in with the human. A solution therefore becomes architecting the intelligence such that agency flows back and forth without friction — perhaps via a brain-computer interface.
I have a draft essay from first principles on why strong shared identity between humans and AI is not only likely but unavoidable based on collective identity and shared memories, which might help bolster understanding of my premises.
My priors lead me to the conclusion that the transition period between very capable AGI and ASI is the most dangerous time. To your point, humans with misaligned values amplified by very capable AGI can do very bad things. If we reach the type of ASI that I optimistically describe (sufficiently capable + extensive knowledge for benevolence) then it can intervene like “hey man, how about you go for a walk first and think about if that’s what you really want to do”.
I’ll think more about these, I think here though we are tapping into deep philosophical debates that I don’t have the answer to, but perhaps ASI does and it is an answer that we would view favourably.
This is an interesting viewpoint but is also by definition limited by human limitations. We might recontextualise by acting out revenge, enacting in-group aligned values etc. A more enlightened viewpoint would be at peace with things being as they are because they are.
I look forward to seeing what you come up with.