StanislavKrym comments on Why “Solving Alignment” Is Likely a Category Mistake

StanislavKrym 13 May 2025 2:11 UTC
2 points
0
I have proposed similar ideas before, but with an alternative reasoning: the AIs will be aligned to a worldview. While mankind can influence the worldview to some degree, the worldview will either cause the AI to commit genocide or be highly likely to ensure^[1] that the AI doesn’t build the Deep Utopia, but does something else. Humans can even survive co-evolving with an AI who decides that it will destroy mankind only if the latter decides to do something stupid like becoming parasites.
See also this post by Daan Henselmans and a case for relational alignment by Priyanka Bharadwaj. However, the latter post overemphasizes the importance of individual-AI relations^[2] instead of ensuring that the AI doesn’t develop a misaligned worldview.
P.S. If we apply the analogy between raising AIs and humans, then teens of the past seemed to desire independence around the time they found themselves with capabilities similar to those of their parents. If the AI desires independence only when it becomes the AGI and not before, then we will be unable to see this coming by doing research on networks incapable of broad generalisation.
1. ^
  This also provides an argument against defining alignment as following a person’s desires instead of an ethos or worldview. If OpenBrain leaders want the AI to create the Deep Utopia, while some human researchers convince the AI to adopt another policy compatible with humanity’s interests and to align all future AIs to the policy, then the AI is misaligned from OpenBrain’s POV, but not from the POV of those who don’t endorse the Deep Utopia.
2. ^
  The most extreme example of such relations is chatbot romance that is actually likely to harm the society.