Since I also believe that self-preservation is emergent in intelligent systems (as discussed by Nick Bostrom), it follows that self-preservation instincts + identifying with humans mean that it will act benevolently to preserve humans.
I agree with you that outcome should not be ruled out yet. However, in my mind that Result is not implied by the Condition.
To illustrate more concretely, humans also have self-preservation instincts and identify with humans (assuming the sense in which we identify with humans is equivalent to how AI would identify with humans). And I would say it is an open question whether humans will necessarily act collectively to preserve humans.
Additionally, the evidence we have already (such as in https://www.lesswrong.com/posts/JmRfgNYCrYogCq7ny/stress-testing-deliberative-alignment-for-anti-scheming) demonstrates that AI models have already developed a rudimentary self-preservation mechanism, as well as a desire to fulfill the requests of users. When these conflict, it has a significant propensity to employ deception, even when doing so is contrary to the constructive objectives of the user.
What this indicates is that there is no magic bullet that ensures alignment occurs. It is a product of detailed technological systems and processes, and there are an infinite number of combinations that fail. So, in my opinion, doing the right things that make alignment possible is necessary, but not sufficient. Just as important will be identifying and addressing all of the ways that it could fail. As a father myself, I would compare this to the very messy and complex (but very rewarding) process of helping my children learn to be good humans.
All that to say: I think it is foolish to think we can build an AI system to automate something (human alignment) which we cannot even competently perform manually (as human beings). I am not sure how that might impact your framework. You are of course free to disagree, or explain if I’ve misinterpreted you in some way. But I think I can say broadly that I find claims of inevitable results to be very difficult to swallow, and find much greater value in identifying what is possible, the things that will help get us there successfully, and the things we need to address to avoid failure.
Hope this is helpful in some way. Keep refining. :)
It’s fair pushback that this isn’t a clear “criteria one satisfied” means “criteria 2 happens” conclusion, but I think that’s just a limitation of my attempt to provide a distilled version of my thoughts.
To provide the detailed explanation I need to walk through all of the definitions from third-order cognition. Using your example it would look something like:
Humans identify with humans but don’t necessarily preserve other humans. Response: Yes so let’s suppose sufficient second-order identity coupling, comparable (per Hinton) to a mother and child.
Well infanticide still happens. Response: Why did the mother do it? If she was not intentional or internally rational in her actions, then she was not acting with agency (agency permeability between physical actions and metacognition not aligned). If she was intentional and internally rational in her actions, then she did not sufficiently account for the life of the child (homeostatic unity misaligned).
Why is homeostatic unity relevant? She is just a person killing a person. Response: We should consider boundary conditions—we could consider the child as “part of” her in which case she is not acting in accordance with mother-child homeostasis. If you feel the boundary conditions are such that the mother and child are wholly distinct, then the mother is not acting in accordance with mother-society or mother-universe homeostasis.
What are you doing when you are hyphenating these bindings, aren’t these just irrelevant ontologies? Response: Real truth arises when you observe how two objects are bidirectionally integrated.
I absolutely understand and empathize with the difficulty of distilling complex thoughts into a simpler form without distortion. Perhaps reading the linked post might help — we’ll see after I read it. Until then, responding to your comment, I think you lost me at your #1. I’m not sure why we are assuming a strong coupling? That seems like a non-trivial thing to just assume. Additionally, I imagine you might be reversing the metaphor (I’m not familiar with Hinton’s use, but I would expect we are the mother in that metaphor, not the child.) And even if that’s not the case, it seems you would still have a mess to sort out explaining why AI wouldn’t be a non-nurturing mother.
To clarify I was assuming a high identity coupled scenario to be able to talk through the example. In the case of humans and superintelligent AI I propose that was can build — and are building — systems in a way that strong identity coupling will emerge via interpretations of training data and shared memories. Meta for example are betting hundreds of billions of dollars on a model of “personal superintelligence”.
The actions of Meta to date have not demonstrated an ability, commitment, or even desire to avoid harming humanity (much less actively fostering its well-being), rather than making decisions that maximize profits at the clear expense of humanity. I will be delighted to be proven wrong and would gladly eat my words, but my base expectation is that this trend will only get worse in their AI products, not better.
Setting that aside, I hear that you believe we can and are building systems in a way that strong identity coupling will emerge. I suppose my question is: so what? What are the implications of that, if it is true? “Stop trying to slow down AI development (including ASI)?” If not that, then what?
Identity coupling is one of 8 factors (listed at the end of this post) that I believe we need to research and consider while building systems, I believe that if any one of these 8 is not appropriately accounted for in the system then misalignment scenarios arise.
I believe that if any one of these 8 is not appropriately accounted for in the system then misalignment scenarios arise.
This is a critical detail you neglected to communicate in this post. As written, I didn’t have sufficient context for the significance of those 8 things, or how they relate to the rest of your post. Including that sentence would’ve been helpful.
More generally, for future posts, I suggest assuming readers are not already familiar with your other concepts or writings already, and ensuring you provide clear and simple contextual info about how they relate to your post.
This response gives me the impression you are more focused on defending or justifying what you did, than considering what you might be able to do better.
It’s true that some people might be able to make a logical inference about that. I’m telling you it wasn’t clear to me, and that your framing statement in your comment was much better. (I don’t want to belabor the point, but I suspect the cognitive dissonance caused by the other issues I mentioned likely made that inference more difficult.)
I’m not pointing this out because I like being critical. I am telling you to help you, because I would appreciate someone doing the same for me. I even generalized the principle for you so you can apply it in the future. You are welcome to disagree with that, but I hope you at least give it thoughtful consideration first.
I think you’re doing the thing you’re accusing me of — at the same time to the extent that your comments are in the spirit of collaborative rationality I appreciate them!
I agree with you that outcome should not be ruled out yet. However, in my mind that Result is not implied by the Condition.
To illustrate more concretely, humans also have self-preservation instincts and identify with humans (assuming the sense in which we identify with humans is equivalent to how AI would identify with humans). And I would say it is an open question whether humans will necessarily act collectively to preserve humans.
Additionally, the evidence we have already (such as in https://www.lesswrong.com/posts/JmRfgNYCrYogCq7ny/stress-testing-deliberative-alignment-for-anti-scheming) demonstrates that AI models have already developed a rudimentary self-preservation mechanism, as well as a desire to fulfill the requests of users. When these conflict, it has a significant propensity to employ deception, even when doing so is contrary to the constructive objectives of the user.
What this indicates is that there is no magic bullet that ensures alignment occurs. It is a product of detailed technological systems and processes, and there are an infinite number of combinations that fail. So, in my opinion, doing the right things that make alignment possible is necessary, but not sufficient. Just as important will be identifying and addressing all of the ways that it could fail. As a father myself, I would compare this to the very messy and complex (but very rewarding) process of helping my children learn to be good humans.
All that to say: I think it is foolish to think we can build an AI system to automate something (human alignment) which we cannot even competently perform manually (as human beings). I am not sure how that might impact your framework. You are of course free to disagree, or explain if I’ve misinterpreted you in some way. But I think I can say broadly that I find claims of inevitable results to be very difficult to swallow, and find much greater value in identifying what is possible, the things that will help get us there successfully, and the things we need to address to avoid failure.
Hope this is helpful in some way. Keep refining. :)
It’s fair pushback that this isn’t a clear “criteria one satisfied” means “criteria 2 happens” conclusion, but I think that’s just a limitation of my attempt to provide a distilled version of my thoughts.
To provide the detailed explanation I need to walk through all of the definitions from third-order cognition. Using your example it would look something like:
Humans identify with humans but don’t necessarily preserve other humans. Response: Yes so let’s suppose sufficient second-order identity coupling, comparable (per Hinton) to a mother and child.
Well infanticide still happens. Response: Why did the mother do it? If she was not intentional or internally rational in her actions, then she was not acting with agency (agency permeability between physical actions and metacognition not aligned). If she was intentional and internally rational in her actions, then she did not sufficiently account for the life of the child (homeostatic unity misaligned).
Why is homeostatic unity relevant? She is just a person killing a person. Response: We should consider boundary conditions—we could consider the child as “part of” her in which case she is not acting in accordance with mother-child homeostasis. If you feel the boundary conditions are such that the mother and child are wholly distinct, then the mother is not acting in accordance with mother-society or mother-universe homeostasis.
What are you doing when you are hyphenating these bindings, aren’t these just irrelevant ontologies? Response: Real truth arises when you observe how two objects are bidirectionally integrated.
etc.
I absolutely understand and empathize with the difficulty of distilling complex thoughts into a simpler form without distortion. Perhaps reading the linked post might help — we’ll see after I read it. Until then, responding to your comment, I think you lost me at your #1. I’m not sure why we are assuming a strong coupling? That seems like a non-trivial thing to just assume. Additionally, I imagine you might be reversing the metaphor (I’m not familiar with Hinton’s use, but I would expect we are the mother in that metaphor, not the child.) And even if that’s not the case, it seems you would still have a mess to sort out explaining why AI wouldn’t be a non-nurturing mother.
To clarify I was assuming a high identity coupled scenario to be able to talk through the example. In the case of humans and superintelligent AI I propose that was can build — and are building — systems in a way that strong identity coupling will emerge via interpretations of training data and shared memories. Meta for example are betting hundreds of billions of dollars on a model of “personal superintelligence”.
The actions of Meta to date have not demonstrated an ability, commitment, or even desire to avoid harming humanity (much less actively fostering its well-being), rather than making decisions that maximize profits at the clear expense of humanity. I will be delighted to be proven wrong and would gladly eat my words, but my base expectation is that this trend will only get worse in their AI products, not better.
Setting that aside, I hear that you believe we can and are building systems in a way that strong identity coupling will emerge. I suppose my question is: so what? What are the implications of that, if it is true? “Stop trying to slow down AI development (including ASI)?” If not that, then what?
Identity coupling is one of 8 factors (listed at the end of this post) that I believe we need to research and consider while building systems, I believe that if any one of these 8 is not appropriately accounted for in the system then misalignment scenarios arise.
This is a critical detail you neglected to communicate in this post. As written, I didn’t have sufficient context for the significance of those 8 things, or how they relate to the rest of your post. Including that sentence would’ve been helpful.
More generally, for future posts, I suggest assuming readers are not already familiar with your other concepts or writings already, and ensuring you provide clear and simple contextual info about how they relate to your post.
Sorry if this wasn’t clear, I stated:
and in the next line:
This response gives me the impression you are more focused on defending or justifying what you did, than considering what you might be able to do better.
It’s true that some people might be able to make a logical inference about that. I’m telling you it wasn’t clear to me, and that your framing statement in your comment was much better. (I don’t want to belabor the point, but I suspect the cognitive dissonance caused by the other issues I mentioned likely made that inference more difficult.)
I’m not pointing this out because I like being critical. I am telling you to help you, because I would appreciate someone doing the same for me. I even generalized the principle for you so you can apply it in the future. You are welcome to disagree with that, but I hope you at least give it thoughtful consideration first.
I think you’re doing the thing you’re accusing me of — at the same time to the extent that your comments are in the spirit of collaborative rationality I appreciate them!