Should we align AI with maternal instinct?
Epistemic status: Philosophical argument. I’m critiquing Hinton’s maternal instinct metaphor and proposing relationship-building as a better framework for thinking about alignment. This is about shifting conceptual foundations, not technical implementations.
--
Geoffery Hinton recently argued that since AI will become more intelligent than humans, traditional dominance-submission models won’t work for alignment. Instead, he suggests we might try building “maternal instincts” into AI systems, so they develop genuine compassion and care for humans. He offers the mother-baby relationship as the only example we have of a more intelligent being “controlled” by a less intelligent one.
I don’t buy this—for starters, it is not clear that mothers are always more intelligent than their babies, and it is also not clear that it is always the babies that control their mothers. And I’m just scratching the surface here.
Most AI alignment discourse still revolves around control mechanisms, oversight protocols, and reward functions, as though alignment were an engineering puzzle to be solved through clever constraints.
Some of the stalwarts in the field like Hinton and others are finally seeing that alignment is a relational problem. Personally, that is immensely validating for the ideas I’ve had as an outsider about AI ever since I started thinking.
I am glad that this field is finally looking for insights from others—neuroscientists, psychologists, philosophers, etc, who’ve been studying the human condition, intelligence and relationships from other perspectives that might potentially add to the study of neural networks.
This shift towards a relational approach matters.
As for Hinton’s specific metaphor (maternal instinct), it makes me uneasy. The idea that mothers are endlessly selfless, forever attuned to every cry, governed by an unshakable instinct to nurture, is not rooted in lived reality, it a stereotype (if not a fantasy).
As a mother of two myself, I know there are days when love for my children is abundant but my patience is not. There are bills to be paid, deadlines to be met, identities beyond “mum” to be kept alive, even while the instinct to give everything to my children tugs at me. And there are nights when that instinct simply evaporates, replaced by exhaustion.
From what I’ve seen, maternal instinct feels less like an infinite reservoir and more like a phase. Powerful, yes, but not forever. For animals, it lasts weeks or months or couple of years at best. For humans, longer perhaps, but always subject to health, culture, circumstance, and survival.
And when that instinct stretches unnaturally into “forever,” it can warp into something darker. I’ve seen mothers who continue to submit to demanding adult children, long after the instinct should have given way to mutual respect and boundaries. I’ve also seen mothers who mask domination as care, weaving control into the very language of concern. The stories of “narcissistic mothers”, told and retold in therapy rooms, books, and online forums, remind us that instinct is no guarantee of benevolence.
If anything, the metaphor needs turning upside down. We are the parents here. We created this technology, and like parents everywhere, we face the dilemma of how to guide something that may eventually exceed our capabilities.
I’ve realised that parenting (humans), at its best, is not about control, nor about infinite sacrifice. It is about the art of relationship, of building trust, losing it, repairing it, and learning how to keep going even when rupture feels inevitable.
But we humans are clumsy at this.
Most of us are never taught how to build trust, let alone repair it once it breaks. We muddle through it in marriages, workplaces, and politics, often failing, often repeating the same mistakes. And yet, perhaps this is the skill set alignment really needs. Not permanent obedience, not hardwired instinct, but the messy, adaptive practice of staying in relationship over time.
Obedience won’t save us, and instinct won’t either. Instincts, as we humans know them, are fleeting. Relationships, when designed with value recognition, memory for care, tools to repair and renew, can endure. That is what motherhood has taught me, not that care is boundless, but that bonds survive only when they can stretch, fray, and mend again.
And it is through that lens I see AI. Not as a child that will always need us, nor as a tool that will always obey us, but as something we are already in relationship with.
To be clear, I’m not claiming AI systems have genuine feelings or agency, but humans inevitably relate to them through familiar social frameworks, and those frameworks matter for how we design and interact with these systems.
I sometimes wonder if the real question isn’t whether AI will one day betray us, but whether we will have taught it, and ourselves, how to repair when it does.
--
I think about these relational dynamics in AI alignment, and I’m always looking for collaborators who’d like to explore the technical feasibility of these ideas. If the relational alignment approach resonates with you and you’re interested in experimenting with relationship-building architectures or sparring on what implementation might actually look like, please reach out.
I suspect you are right, however to play devils advocate: in my opinion, the closest example we have of anyone stably aligning a superintelligent creature is housecats aligning humans, and co-opting maternal instinct is a large part of how they did it.
Completely agreed—suggesting that this is a solution was a failure to think the next thought.
Nevertheless, if we had any idea how to actually, successfully do what Hinton suggested, even if we really wanted to? I’d feel a lot better about our prospects than I do right now.
“If you want a picture of the future, imagine a chancla stomping on a human face—forever.”
This I think however misses the point and becomes a bit of a platitude. Yes, it’s true that the interaction with AI is relational, but the thing that IMO purely humanistic perspectives really miss is that this is a relation in which half of the relation isn’t human. Not human, not mammal, not even biological—think of the weirdest animal you could try to form a relationship with, then think weirder. You wouldn’t keep a wild tiger or a grizzly bear in your home, and those are still far more similar to us than an AI, a priori, has any reason to be. If the AI resembles us—insofar as it does—it’s only thanks to how we shaped it in the first place. I really don’t expect AI to betray us either out of cunning malevolence or out of bitterness for our treatment of it, unless we make it capable of being malevolent or feeling bitter in the first place (or at least, unless we don’t take care to not make it be those things).
So in some way, “build an AI that has with us the relationship a healthy, non-abusive mother has with her children” may not be a terrible idea. The problem with that is:
we’re very very far from being able to hit that target with precision from a technical standpoint (and it needs to be precise, since as you point out it is terribly close to things like “narcissistic ego-monster that only uses you for their self-gratification”)
even if we could hit it successfully, it still implies a degree of subjugation and quite literal infantilisation that doesn’t seem like the best way for our species to spend its entire future.
I completely agree that AI isn’t human, mammal, or biological, and that any relational qualities it exhibits will only exist because we engineer them. I’m not suggesting we model AI on any specific relationship, like mother-child, or try to mimic familiar social roles. Rather, alignment should be based on abstract relational principles that matter for any human interaction without hierarchy or coercion.
I also hear the frequent concern about technical feasibility, and I take it seriously. I see it as an opportunity rather than a reason to avoid this work. I’d love the chance to brainstorm and refine these ideas, to explore how we might engineer architectures that are simple yet robust, capable of sustaining trust, repair, and cooperation without introducing subjugation or dependency.
Ultimately, relational design matters because humans inevitably interact through familiar social frameworks such as trust, repair, etc. If we ignore that, alignment risks producing systems that are powerful but alien in ways that matter for human flourishing.
I think what you’re describing here sounds more like a higher level problem—“given a population of agents in two groups A and H, where H are Humans and A are vastly more powerful AIs, which policy should agents in A adopt that even when universalised produces a stable and prosperous equilibrium?”. That’s definitely part of it, but the problem I’m referring to when mentioning architectures is “how do we even make an AI that is guaranteed to always stick to such a policy?”.
To be clear, it’s not given that this is even possible. We can’t do it now with AIs still way simpler than AGI. And us humans aren’t an example of that either. We do have general trends and instincts, but we aren’t all “programmed” in a way that makes us positively prosocial. Just imagine what would likely happen if instead of AIs you gave that same level of power to a small group of humans. The problem is that even as is, between humans, relative power and implicit threats of violence are a part of what keeps existing equilibria. That works because humans are all approximately of the same individual power, and rely on social structures to amplify that. If AIs were all individually far more powerful than us, they would need to also have superhuman restraint and morality, not just superhuman intelligence, to not simply start caring about their own business and let us die as a side effect.
I’m not sure what insight that adds though. I don’t think social frameworks would be anything like we’re used to with AIs. We would relate to them by anthropomorphising them probably—that much is sure, we do that already—so some of these things would apply on our side of the relationship. But they don’t need to apply the other way (in fact, I’d be a bit worried about an AI that can decide autonomously to not trust me). If anything a type of relationship that we would consider deeply uncomfortable and wrong with humans—master and servant—might be the safest when it comes to humans and AIs, though it also has its flaws. Building “friend” peer AGIs is already incredibly risky unless you somehow find a way to let them solve the relational problem on the way, while ensuring that the process for doing so is robust.
I’m reminded of a Sanskrit verse “Vidya dadati vinayam, vinayodyati patratam” which translates to intelligence gives power, but humility gives guidance. Applied to AI, intelligence alone doesn’t ensure alignment, just as humans aren’t automatically prosocial. What matters are the high-level principles we embed to guide behaviour toward repairable, cooperative, and trustable interactions, which we do see in long-term relationships built on shared values.
The architecture-level challenge of making AI reliably follow such principles is hard, yes, especially under extreme power asymmetry, but agreeing on relational alignment first is a necessary first step. Master/servant models may seem safe, but I believe carefully engineered relational principles offer a more robust and sustainable path.
I mean, master/servant is a relation. I think if you managed to enforce it rigorously, the biggest risk from it would be humans “playing themselves”—just as we’ve done until now, only with far greater power. For example basically falling into wireheading out of pursuit of enjoyment, etc.
Can you sketch a broad example of how such a thing would look like? How does it differ from example from the classic image of a Friendly AI (FAI)?
As far as I understand “aligning the AI to an instinct”, and “carefully engineered relational principles”, the latter might look like “have the AI solve problems that humans actually cannot solve by themselves AND teach the humans how to solve them so that they or each human taught would increase the set of problems they can solve by themselves”. A Friendly AI in the broader sense is just thought to solve humanity’s problems (e.g. establish a post-work future, which my proposal doesn’t).
As for aligning the AI to an instinct, instincts are known to be easily hackable. However, I think that the right instincts can alter the AIs’ worldview in the necessary direction (e.g. my proposal of training the AI to help weaker AIs could generalize to helping the humans as well) or make the AIs worse at hiding misalignment of themselves or of their creations.
For example, if the AIs are trained to be harsh and honest critiques,[1] then in the AI-2027 forecast Agent-3 might have pointed out that, say, a lack of substantial oversight would let instumental convergence sneak adversarial misalignment in. Or that Agent-3 copies don’t understand how the AIs are to be aligned to serve humans, not to help the humans become more self-reliant as described above.
Which was explicitly done by the KimiK2 team.
I don’t mean this as a technical solution, more a direction to start thinking in.
Imagine a human tells an AI, “I value honesty above convenience.” A relational AI could store this as a core value, consult it when short-term preferences tempt it to mislead, and, if it fails, detect, acknowledge, and repair the violation in a verifiable way. Over time it updates its prioritisation rules and adapts to clarified guidance, preserving trust and alignment, unlike a FAI that maximises a static utility function.
This approach is dynamic, process-oriented, and repairable, ensuring commitments endure even under mistakes or evolving contexts. It’s a sketch, not a finished design, and would need iterative development and formalization.
While simple, does this broadly capture the kind of thing you were asking about? I’d be happy to chat further sometime if you’re interested.
Thank you for writing this.
(Generally, I think that approaches like “we need to use this applause light to align the AI” are mistaken.)
In some other world somewhere, the foremost Confucian scholars are debating how to endow their AI with filial piety.
Wait… isn’t this already filial piety? We created AI, and now we want it to mother us.
I totally agree with this. I think it will be pretty hard to fully control an AI, it even seems impossible to me. Maybe the best we can hope for is to have a good relationship with it.
Happy to collaborate!
I agree. That was my reaction to Hinton’s comment as well—that it’s good to think in terms of relationship rather than control, but that the “maternal instinct” framing was off.
At the risk of getting too speculative, this has implications for AI welfare as well. I don’t believe that current LLMs have feelings, but if we build AGI it might. And rather than thinking about how to make such an entity a controllable servant, we should start planning how to have a mutually beneficial relationship with it.
To the extent that maternal instincts are some actual small concrete set of things, you are probably making two somewhat opposite mistakes here: Imagining something that doesn’t truly run on maternal instinct, and assuming that mothers actually care about their babies (for a certain definition of “care”).
You say that mothers aren’t actually “endlessly selfless, forever attuned to every cry, governed by an unshakable instinct to nurture”, that there are “identities beyond ‘mum’ to be kept alive” and that there are nights that instinct disappears. But that’s because you feel exhaustion, or also care about things other than your children. We don’t need to create things like that. If “maternal instincts” are or can be translated into utility functions, we could just set them as the only thing an AI optimises, and those issues would be gone. I’m not sure that the domination you talk about is actually part of maternal instincts. Even less that narcissism is.
Then, the case against maternal instincts might be stronger than you think. First, mothers have preferences about their babies that are clearly not caring about them (by which I mean caring about the baby’s values). They might want to be near them, to look at them, they might like when their baby holds onto them, none of which they do because it’s what their babies prefer. Then, they might not even care about what the babies care about at all. Rather, they might want to do something that we might call altruistic empathy: wanting what they think something like themselves would want if they were taking the actions the baby takes (which is a big issue if we want to do the same with AIs). A baby might cry because they are uncomfortable, and their mothers might comfort them, but the cry might (in some cases) be a hardcoded mechanism, potentially separate from any kind of consequentialist reasoning derived from their wants.
If mothers actually cared about their babies (and were able to think through the consequences of that), they wouldn’t expose them to the world. At birth, babies don’t care about e.g. dinosaurs, because they haven’t seen them, or trees, or actually objects of any kind. By showing them reality, their values (if they have any) probably change, and they are transformed into something that a younger version of the baby would not endorse. It’s just that babies are not smart enough to do anything about it. I think something like this has already been talked about elsewhere, under the name of super babies (not the genetically modified ones), but right now I can’t find it. Mothers might have preferences over the preferences of their children, and use their babies as raw materials for creating the version of the children they want. With older children and other people, I’m not sure the mechanism for care is the same (although it might be, which would be worrying).
Also, you might want to learn about cooperative inverse reinforcement learning, in case you aren’t aware of it. Utility functions are probably perfectly fine, it’s just that there needs to be a mechanism for updating them from sensory data.