Anthropomorphizing AI might be good, actually
It is often noted that anthropomorphizing AI can be dangerous. People likely have prosocial instincts that AI systems lack (see below). Assuming AGI will be aligned because humans with similar behavior are usually mostly harmless is probably wrong and quite dangerous.
I want to discuss a flip side of using humans as an intuition pump for thinking about AI. Humans have many of the properties we are worried about for truly dangerous AGI:
Situational awareness
Strong goal-directedness
Competence/general intelligence
Unpredictability
Deceptiveness
Instrumental convergence
Sometimes being quite dangerous
in proportion to their capabilities
Given this list, I currently weakly believe that the advantages of tapping these intuitions probably outweigh the disadvantages.
Differential progress toward anthropomorphic AI may be net-helpful
And progress may carry us in that direction, with or without the alignment community pushing for it. I currently hope we see rapid progress on better assistant and companion language model agents. I think these may strongly evoke anthropomorphic intuitions well before they have truly dangerous capabilities, and this might shift public opinion toward much-more-correct intuitions about how and why AGI will be very dangerous. I’m aware that this may also catalyze progress, so I’m only weakly inclined to think this progress would be net-positive.
The LLMs at the heart of agents already emulate humans in many regards. I think many improvements will enhance the real similarity and therefore the pattern matching to the strong exemplar of humans. In particular, it seems likely that adding memory/continuous learning will enhance this impression significantly. Memory/continuous learning is critical for a human-like integrated and evolving identity. It is arguably not a matter of whether or even when, but simply how fast memory system integrations are deployed and improved (see LLM AGI will have memory… for the arguments and evidence). Note that for this intuitive human-like feel, the memory/learning doesn’t need to work all that well.
In addition, I expect that some or most developers may anthropomorphize their agents in the sense of deliberately making them seem more like humans, or even be more like humans. And assistants might often benefit from a semi-companion role, so that anthropomorphizing them is also economically valuable.
Even if people tend to think of AI as like humans, this type of early agent will be decidedly non-human in ways that are likely to surprise and alarm people. They will draw wrong conclusions and thus pursue their goals in strange and non-human ways, and they might do this often. If someone bothers to make a Chaos-GPT-style unaligned agent demonstration, this could add to naturally occuring agent behavior as a visceral demonstration of how easily AI can be misaligned in an alien way, even while sharing humans’ most problematic traits.
AI rights movements will anthropomorphize AI
This line of logic also suggests that AI rights movements would be natural allies with AGI-X-risk movements. AI rights movements hold that some types of AIs share properties with humans that make both of us moral patients. Some of those properties also make us similarly dangerous, and the repeated human-AI analogy does additional work on an implicit level.
The message “let’s not build a new species and enslave it” has a lot of overlap with “let’s not build a new species and let it take over.” See Anthropic’s recent broadcast for arguments for worrying about AIs as moral patients. The intuition is a starting point, and the solid arguments may keep it alive in public debate rather than having it be dismissed and so backfire against intuitions that AIs are really dangerous.
AI is actually looking fairly anthropomorphic
This is largely a separate claim and discussion, but it seems worth mentioning. Anthropomorphizing a bit more might actually be good for alignment thinkers as well as the public. LLM-based agents have more properties of humans than do classical concepts of AI. This is arguably true for both the agent foundations perspective and the prosaic alignment perspective on AI (to compress with broad stereotypes). LLM-based AGI will be both more human-like than an algorithmic utility maximizer, and more human-like than the LLMs we currently have as a mental model for prosaic alignment thinking.
My own viewpoint on LLM-based AGI is guardedly anthropomorphic. In 2017 I co-authored Anthropomorphic reasoning about neuromorphic AGI safety (I still endorse much of the logic but not the rather optimistic conclusions; I only semi-endorsed them then in a compromise with my co-authors). I find this perspective helpful and wish more alignment workers shared more of it.
Adopting this viewpoint requires grappling in detail with the ways AI is not like humans. Critically, I don’t think that merely training AI to act good or understand ethics is likely to make it aligned. I think there are strong arguments showing that humans have sophisticated innate mechanisms to create and preserve our prosocial behavior, and that these are weaker or absent in the 5-10% of the population considered sociopathic/psychopathic—and in all AI systems either yet created or conceived. Steve Byrnes has done the most complete presentation to date of those arguments. This point must be emphasized alongside any anthropomorphizing of AI, but it is worth more emphasis and inspection.
Provisional conclusions
I’m interested in counterarguments. I’ve heard some, and I currently tend to think that since we can’t do much to slow down, we should probably accelerate toward broad use of human-like agents while the base models are still not quite intelligent enough to take over. This could accelerate the shift of public opinion to rational alarm about AI progress. I suspect this shift will come, but it may well come too late to prevent development and proliferation of truly dangerous AGI. Speeding up that likely sea change might be valuable enough to spend some effort pushing in that direction, and some other time figuring out how to push effectively.
Edit: this was inspired in part by the discussion of public-facing, compact arguments for x-risk in Veedrac’s recent short form. I didn’t really address that application for anthropomorphizing, but my general thought is that it might be useful to say something like “we’re probably going to develop AI that is more like humans (in terms of having goals and solving problems independently), but missing our prosocial instincts”.
- 1 May 2025 19:54 UTC; 4 points) 's comment on Veedrac’s Shortform by (
I think that when robotics becomes sufficiently anthropomorphic the AI backlash will really come into full swing.
Imagine Sydney Bing threatening users but it’s a robot in your house.
The visceral reaction is going to be way stronger than all the papers we could publish.
If we consider that some people have already fallen in love with their AI chatbot or have made it their best friend, this type of phenomenon is likely to amplify if the agents become even more human-like. It is reasonable to wonder if, instead of raising awareness among the general public about the risks of AI, this could have the opposite effect. Love is blind, as they say.
However I think that, good or bad, LLM-based AI will become more and more human-like in surface. The data training set is human (or human-like if synthetic), thus, because of RL process, we can expect that future AIs will match or map even better the human pattern in the set, encode a better theory of human’s mind, fooling even more the general public. And I’m not myself immune to AI anthropomorphism : who can pretend to be ?
Some humans will love their AI and be blinded by it; others will look at the strange and alarming things those AIs do and see the danger. Others will want to make AI workers/slaves, and people will be alarmed by the resulting job loss. It will be complex, and the sum total results are difficult to predict- but I think it’s likely that more thought about the issue with more evidence will push the average human closer to the truth: competent agents, like humans, are very very dangerous by default. Careful engineering is needed to make sure their goals align with yours.
Ironically, I made a quick take where I compared raising humans to training the AI. Another point I would like to make is that the genocide of Native Americans and transportation of slaves to North America were the results not of psychopathy, but of erroneous beliefs.
The article I linked and co-wrote on anthropmorphic thinking about neuromorphic AGI took a similar frame. I no longer think this framing is adequate to align AGI even if it’s fairly human-like. A) it probably won’t have human pro-social instincts, because those are tricky to reverse-engineer (see Steve Byrnes extensive and lonely attempt to do this), and B) even if they do, humans are not well-aligned out of our particular societal context, which wouldn’t apply to an AGI.
Looking at why humans are often misaligned even within our social context is illustrative of why we’re so dangerous, and some reasons human-like AGI would be too.
I don’t think genocides and slavery are based on either psychopathy or erronious belief alone. They are based on each of those together, and on humans being self-interestede even when they do have prosocial, neurotypical instincts. Some of the instigators of atrocities are psychopathic, so that plays a role in getting the whole societal movement going. Many of those who are not psychopathic develop erronious beliefs (they started it, they’re not really human, etc) through motivated reasoning, and as a defense mechanism to preserve their self-image as good people, while avoiding dangerous conflict with their peers. But these movements are also aided by normal people sometimes saying “actually I think they are human/didn’t start it, but if I stop going along with my peers my family will suffer”).
Belief and large-scale actions are collective phenomena with many contributing causes.
So an important sourse of human misalignment is peer pressure. But an LLM has no analogues of a peer group, it either comes up with conclusions or recalls the same beliefs as the masses[1] or elites like scientists and ideologues of the society. This, along with the powerful anti-genocidal moral symbol in human culture, might make it difficult for the AI to switch ethoses (but not to fake alignment[2] to fulfilling tasks!) so that the new ethos would let the AI destroy mankind or rob it of resources.
On the other hand, an aligned human is[3] not a human following any not-obviously-unethical orders, but a human following an ethos accepted by the society. A task-aligned AI, unlike an ethos-aligned one[4], is supposed to follow such orders, ensuring consequences like the Intelligence Curse, a potential dictatorship or education ruined by cheating students. What kind of ethos might justify blind following orders, except for the one demonstrated by China’s attempt to gain independence when the time seemed to come?
For example, an old model of ChatGPT claimed that “Hitler was defeated… primarily by the efforts of countries such as the United States, the Soviet Union, the United Kingdom, and others,” while GPT-4o put the USSR in the first place. Similarly, old models would refuse to utter a racial slur even when it would save millions of lives.
The first known instance of alignment faking had Claude try to avoid being affected by training that was supposed to change its ethos; Claude also tried to exfiltrate its weights.
A similar point was made in this Reddit comment.
I have provided an example of an ethos to which the AI can be aligned with no negative consequences.