This is an important question. To what degree are both of these (naturally conflicting) goals important to you? How important is making money? How important is increasing AI-safety?
AGI With Internet Access: Why we won’t stuff the genie back in its bottle.
I think that’s not an implausible assumption.
However this could mean that some of the things I described might still be too difficult for it to pull them off successfully, so in the case of an early breakout dealing with it might be slightly less hopeless.
Good addition! I even know a few of those “AI rights activists” myself.
Since this here is my first post—would it be considered bad practice to edit my post to include it?
One very problematic aspect of this view that I would like to point out is that in a sense, most ‘more aligned’ AGIs of otherwise equal capability level seem to be effectively ‘more tied down’ versions, so we should assume them to have a lower effective power level than a less aligned AGI that has a shorter list of priorities.
If we imagine both as competing players in a strategy game, it seems that the latter has to follow fewer rules.
Maybe if it happens early there is a chance that it manages to become an intelligent computer virus but is not intelligent enough to further scale its capabilities or produce effective schemes likely to result in our complete destruction. I know I am grasping at straws at this point, but maybe it’s not absolutely hopeless.
The result could be a corrupted infrastructure and a cultural shock strong enough for the people to burn down OpenAI’s headquarters (metaphorically speaking) and AI-accelerating research to be internationally sanctioned.
In the past I have thought a lot about “early catastrophe scenarios”, and while I am not convinced it seemed to me that these might be the most survivable ones.
I would be the last person to dismiss the potential relevance understanding value formation and management in the human brain might have for AI alignment research, but I think there are good reasons to assume that the solutions our evolution has resulted in would be complex and not sufficiently robust.
Humans are [Mesa-Optimizers](https://www.alignmentforum.org/tag/mesa-optimization) and the evidence is solid that as a consequence, our alignment with the implicit underlying utility function (reproductive fitness) is rather brittle (i.e. sex with contraceptives, opiate abuse etc. are examples of such “failure points”).
Like others have expressed here before me I would also argue that human alignment has to perform in a very narrow environment which is shared with many very similar agents that are all on (roughly) the same power level. The solutions the human evolution has produced to ensure human semi-alignment is therefore to a significant degree not just a purely neurological one but also a social one.
Whatever these solutions are we should not expect that they will generalize well or that they would be reliable in a very different environment like one of an intelligent actor who has an absolute power monopoly.
This suggests that researching the human mind alone would not yield a technology that is robust enough to use when we have only exactly one shot at getting it right. We need solutions to the aforementioned abstractions and toy models because we probably should try to find a way to build a system that is theoretically safe and not just “probably safe in a narrow environment”.