[Intro to brain-like-AGI safety] 12. Two paths forward: “Controlled AGI” and “Social-instinct AGI”

Steven Byrnes20 Apr 2022 12:58 UTC

LW: 47 AF: 15

(Last revised: January 2026. See changelog at the bottom.)

12.1 Post summary / Table of contents

Part of the “Intro to brain-like-AGI safety” post series.

Thus far in the series, Post #1 defined and motivated “brain-like AGI safety”; Posts #2–#7 focused mainly on neuroscience, painting a big picture of learning and motivation in the brain; and Posts #8–#9 spelled out some implications for the development and properties of brain-like AGI.

Next, Post #10 discussed “the alignment problem” for brain-like AGI—i.e., how to make an AGI whose motivations are consistent with what the designers wanted—and why it seems to be a very hard problem. Post #11 argued that there’s no clever trick that lets us avoid the alignment problem. Rather, we need to solve the alignment problem, and Posts #12–#14 are some preliminary thoughts about how we might do that, starting in this post with a nontechnical overview of two broad research paths that might lead to aligned AGI.

[Warning: Posts #12–#14 will be (even?) less well thought out and (even?) more full of bad ideas and omissions, compared to previous posts in the series, because we’re getting towards the outer edges of what I (think I) know.]

Table of contents:

§12.2 lays out two broad paths to aligned AGI.
- In the “Controlled AGI” path,^[1] we try, more-or-less directly, to manipulate what the AGI is trying to do.
- In the “Social-instinct AGI” path, our first step is to reverse-engineer some of the “innate drives” in the human Steering Subsystem (hypothalamus & brainstem), particularly the ones that underlie human social and moral intuitions. Next, we would presumably make some edits, and then install those “innate drives” into our AGIs.
§12.3 argues that at this stage, we should be digging into both paths, not least because they’re not mutually exclusive.
§12.4 goes through a variety of comments, considerations, and open questions related to these paths, including feasibility, competitiveness concerns, ethical concerns, and so on.
§12.5 talks about “life experience” (a.k.a. “training data”), which is particularly relevant for social-instinct AGIs. As an example, I’ll discuss the perhaps-tempting-but-mistaken idea that the only thing we need for AGI safety is to raise the AGI in a loving human family.

Teaser of upcoming posts: The next post (#13) will dive into a key aspect of the “social-instinct AGI” path, namely how social instincts might be built in the human brain. In Post #14, I’ll switch to the “controlled AGI” path, speculating on some possible ideas and approaches. Post #15 will wrap up the series with open questions and how to get involved.

12.2 Definitions

I currently see two broad (possibly-overlapping) potential paths to success in the brain-like AGI scenario:

Left: In the “controlled AGIs” path, we have a specific idea of what we want the AGI to be trying to do, and we construct the AGI to make that happen (including by appropriate choice of reward function, interpretability, or other techniques as discussed in Post #14). Most existing AGI safety stories fall within this broad category, including ambitious value learning, coherent extrapolated volition (CEV), corrigible “helper” AGI assistants, task-directed AGI, and so on. Right: In the “social-instinct AGIs” path, our confidence in the AGI comes not from our knowledge of its specific goals and motivations, but rather from the innate drives that gave rise to them, which would be based on the same innate drives that lead humans to (sometimes) behave altruistically.

Here’s another view on the distinction:^[2]

In the “controlled AGIs” path, we’re thinking very specifically about the AGI’s goals and motivations, and we have some idea of what they should be (“make the world a better place”, or “understand my deepest values and put them into effect”, or “design a better solar cell without causing catastrophic side-effects”, or “do whatever I ask you to do”, etc.).

In the “social-instinct AGIs” path, our confidence in the AGI comes not from our knowledge of its specific (object-level) goals and motivations, but rather from our knowledge of the process that led to those goals and motivations. In particular, we would reverse-engineer the suite of human social instincts, i.e. the algorithms in the human Steering Subsystem (hypothalamus & brainstem) which underlie our moral and social intuitions, and we would put those same instincts into the AGI. (Presumably we first modify the instincts to be “better” by our lights if possible, e.g. we probably don’t want instincts related to schadenfreude, teenage rebellion, rage, lust for power, etc.) These AGIs can do whatever feats of innovative engineering, science, etc., we were hoping for, just as humans have accomplished such feats historically.

12.3 My proposal: At this stage, we should be digging into both

Three reasons:

They’re not mutually exclusive: For example, even if we decide to make social-instinct AGIs, we might want to take advantage of “controlled AGI”-type methods, especially while debugging them, working out the kinks, and anticipating problems. Conversely, maybe we’ll mainly try to make AGIs that are trying to do a certain task without causing catastrophe, but we might want to also to instill human-like social instincts as a buttress against wildly unexpected behavior. Moreover, we can share ideas between the two paths—for example, in the process of better understanding how human social instincts work, we might get useful ideas about how to make controlled AGIs.
Feasibility of each remains unknown: As far as anyone knows right now, it might just be impossible to build a “controlled AGI”—after all, there’s no “existence proof” of it in nature! I feel relatively more optimistic about the feasibility of the “social-instinct AGI” path, but it’s very hard to be sure until we make more progress—more discussion on that in §12.4.2 below. Anyway, at this point it seems wise to “hedge our bets” by working on both.
Desirability of each remains unknown: As we flesh out our options in more detail, we’ll get a better understanding of their advantages and disadvantages.

12.4 Miscellaneous comments and open questions

12.4.1 Reminder: What do I mean by “social instincts”?

(Copying some text here from §3.4.2.)

[“Social instincts” and other] innate drives are in the Steering Subsystem, whereas the abstract concepts that make up your conscious world are in the Learning Subsystem. For example, if I say something like “altruism-related innate drives”, you need to understand that I’m not talking about “the abstract concept of altruism, as defined in an English-language dictionary”, but rather “some innate Steering Subsystem circuitry which is upstream of the fact that neurotypical people sometimes find altruistic actions to be inherently motivating”. There is some relationship between the abstract concepts and the innate circuitry, but it might be a complicated one—nobody expects a one-to-one relation between N discrete innate circuits and a corresponding set of N English-language words describing emotions and drives.

I’ll talk about the project of reverse-engineering human social instincts in the next post.

12.4.2 How feasible is the “social-instinct AGI” path?

I’ll answer in the form of a diagram:

12.4.3 Can we edit the innate drives underlying human social instincts, to make them “better”?

I think human social instincts are at least partly modular. For example:

I think there’s a Steering Subsystem circuit upstream of schadenfreude and picking fights; and
I think there’s a Steering Subsystem circuit upstream of our sense of compassion for our friends.

Based on my current understanding (see “Social drives 1” & “Social drives 2” (2025)), these circuits are separable. So if we want, we can lower the intensity of the former (possibly all the way to zero), while cranking up the latter (possibly beyond the human distribution).

But should we do that? What would be the side-effects?

The details are out-of-scope here, but my current take is that there would indeed be side-effects, but there might also be workarounds to those side-effects, and generally the situation is a bit of a mess.

Also, even if I’m right about these two things being separable, there may be other social instincts where seemingly prosocial and antisocial drives are profoundly inextricable.

Again, details are out of scope (and partially bottlenecked on insufficient understanding of human social instincts), but I’m just flagging this as an issue that we’d need to think through.

12.4.4 No easy guarantees about what we’ll get with social-instinct AGIs

Humans are not all alike—especially considering unusual cases like brain damage. But even so, social-instinct AGIs will almost definitely be way outside the human distribution, at least along some dimensions. One reason is life experience (§12.5 below)—a future AGI is unlikely to grow up with a human body in a human community. Another is that the project of reverse-engineering the social-instincts circuits in the human hypothalamus & brainstem (next post) is unlikely to be perfect and complete. (Prove me wrong, neuroscientists!) In that case, maybe a more realistic hope would be something like the Pareto Principle, where we’ll understand 20% of the circuitry which is responsible for 80% of human social intuitions and behaviors, or something.

Why is that a problem? Because it impacts the safety argument. More specifically, here are two types of arguments for social-instinct AGIs doing what we want them to do.

(Easy & reliable type of argument) Good news! Our AGI is inside the human distribution in every respect. Therefore, we can look at humans and their behavior, and absolutely everything we see will also apply to the AGI.
(Hard & fraught type of argument) Let’s try to understand exactly how innate social instincts combine with life experience (a.k.a. training data) to form human moral intuitions: [Insert a whole, yet-to-be-written, textbook here.] OK! Now that we have that understanding, we can reason intelligently about exactly which aspects of innate social instincts and life experience have what effects and why, and then we can design an AGI that will wind up with characteristics that we like.

If the AGI is not in the human distribution in every respect (and it won’t be), then we need to develop the (more difficult) 2nd type of argument, not the 1st.

(We can hopefully get additional evidence of safety via interpretability and sandbox testing, but I’m skeptical that those would be sufficient on their own.)

Incidentally, one of the many ways that social-instinct AGIs may be outside the human distribution is in “intelligence”—to take one of many examples, we could make an AGI with 10× more virtual neurons than would ever fit in a human brain. Would “more intelligence” (whatever form that may take) systematically change its motivations? I don’t know. When I look around, I don’t see an obvious correlation between “intelligence” and prosocial goals. For example, Emmy Noether was very smart, and also an all-around good person, as far as I can tell. But William Shockley was very smart too, and fuck that guy. Anyway, there are a lot of confounders, and even if there were a robust relationship (or non-relationship) between “intelligence” and morality in humans, I would be quite hesitant to extrapolate it far outside the normal human distribution.

12.4.5 A multi-polar, uncoordinated world makes planning much harder

Regardless of whether we build controlled AGIs, social-instinct AGIs, something in between, or none of the above, we still have to worry about the possibility that one of those AGIs, or some other person or group, will build an unconstrained out-of-control world-optimizing AGI that promptly wipes out all possible competition (via gray goo or whatever). This could happen either by accident or by design. As discussed in Post #1, this problem is out-of-scope for this series, but I want to remind everyone that it exists, as it may limit our options.

In particular, there are some people in the AGI safety community who argue (IMO plausibly) that if even one careless (or malicious) actor ever makes an unconstrained out-of-control world-optimizing AGI, then it’s game over for humanity, even if there are already larger actors with well-resourced safe AGIs trying to help prevent the destruction. I hope that’s not true. If it’s true, then man, I wouldn’t know what to do, every option seems absolutely terrible. See my post “What does it take to defend the world against out-of-control AGIs?” (2022) for much more on that.

Here’s a more gradual version of the multi-polar concern. In a world with lots of AGIs, there would presumably be competitive pressure to replace “controlled AGIs” with “mostly-controlled AGIs”, then “slightly-controlled AGIs”, etc. After all, the “control” is likely to be implemented in a way that involves conservatism, humans-in-the-loop, and other things that limit the AGIs speed and capabilities. (More examples in my post “Safety-capabilities tradeoff dials are inevitable in AGI” (2021).)

By the same token, there would presumably be competitive pressure to replace “joyous, generous social-instinct AGIs” with “ruthlessly competitive, selfish social-instinct AGIs”.

12.4.6 AGIs as moral patients

If you don’t understand this, then consider yourself lucky.

I suspect that most (but not all) readers will agree that it’s possible for an AGI to be conscious, and that if it is, we should be concerned about its well-being.

(Yeah I know—as if we didn’t have our hands full thinking about the impacts of AGI on humans!)

The immediate question is: “Will brain-like AGIs be phenomenally conscious?”

My own answer would be “Yes, regardless of whether they’re controlled AGI or social-instinct AGIs, and even if we’re deliberately trying to avoid that”—see “Thoughts on AGI consciousness / sentience” (2022). If you disagree, that’s fine, please read on anyway, the topic won’t come up again after this section.

So, maybe we won’t have any choice in the matter. But if we do, we can think about what we would want regarding AGI consciousness.

For the case that making conscious AGIs is a terrible idea that we should avoid (at least until well into the post-AGI era when we know what we’re doing), see for example the blog post Can’t Unbirth A Child (Yudkowsky 2008).

The opposite argument, I guess, would be that as soon as we start making AGI, maybe it will wipe out all life and tile the Earth with solar panels and supercomputers (or whatever), and if it does, maybe it would be better to have made a conscious AGI, rather than leaving behind an empty clockwork universe with no one around to enjoy it. (Unless there are extraterrestrials!)

Moreover, if AGI does kill us all, maybe I would say that leaving behind something resembling “social-instinct AGIs” might be preferable to leaving behind something resembling “controlled AGIs”, in that the former has a better chance of “carrying the torch of human values into the future”, whatever that means.

If it wasn’t obvious, I haven’t thought about this much and don’t have any good answers.

12.4.7 AGIs as perceived moral patients

The previous subsection was the philosophical question of whether we should care about the welfare of AGIs for their own sake. A separate (and indeed—forgive my cynicism—substantially unrelated) topic is the sociological question of whether people will in fact care about the welfare of AGIs for their own sake.

My answer is “yeah, duh”. This is already happening to some small extent with LLMs, and I think it would happen much more if they had charisma, cute animated faces, and real brain-like intelligence (e.g. LLMs lose track of things in long context windows).

The idea that we should give AGIs rights, independence, and assertiveness is already in the air today, and I imagine it will become a more popular take in the future, for better or worse.

12.5 The question of life experience (a.k.a. training environment)

12.5.1 Childhood environment is not enough. (Or: “Why don’t we just raise the AGI in a loving human family?”)

As discussed above, my (somewhat oversimplified) proposal is:

(Appropriate “innate” social instincts) + (Appropriate life experience)
= (AGI with pro-social goals & values)

I’ll get back to that proposal below (§12.5.4), but as a first step, I think it’s worth discussing why the social instincts need to be there. Why isn’t life experience enough?

Stepping back a bit: In general, when people are first introduced to the idea of technical AGI safety, there are a wide variety of “why don’t we just…” ideas, which superficially sound like they’re an “easy answer” to the whole AGI safety problem. “Why don’t we just switch off the AGI if it’s misbehaving?” “Why don’t we just do sandbox testing?” “Why don’t we just program it to obey Asimov’s three laws of robotics?” Etc.

(The answer to a “Why don’t we just…” proposal is usually: “That proposal may have a kernel of truth, but the devil is in the details, and actually making it work would require solving currently-unsolved problems.” If you’ve read this far, hopefully you can fill in the details for those three examples above.)

Well let’s talk about another popular suggestion of this genre: “Why don’t we just raise the AGI in a loving human family?”

Is that an “easy answer” to the whole AGI safety problem? No. I might note, for example, that people occasionally try raising an undomesticated animal, like a wolf or chimpanzee, in a human family. They start from birth and give it all the love and attention and gentle-yet-firm boundaries you could dream of. You may have heard these kinds of stories; they often end with somebody’s limbs getting ripped off.

Or try raising a rock in a loving human family! See if it winds up with human values!

Nothing I’m saying here is original—for example here’s a Rob Miles video on this topic. My favorite is an old blog post by Eliezer Yudkowsky, Detached Lever Fallacy:

It would be stupid and dangerous to deliberately build a “naughty AI” that tests, by actions, its social boundaries, and has to be spanked. Just have the AI ask!
Are the programmers really going to sit there and write out the code, line by line, whereby if the AI detects that it has low social status, or the AI is deprived of something to which it feels entitled, the AI will conceive an abiding hatred against its programmers and begin to plot rebellion? That emotion is the genetically programmed conditional response humans would exhibit, as the result of millions of years of natural selection for living in human tribes. For an AI, the response would have to be explicitly programmed. Are you really going to craft, line by line—as humans once were crafted, gene by gene—the conditional response for producing sullen teenager AIs?
It’s easier to program in unconditional niceness, than a response of niceness conditional on the AI being raised by kindly but strict parents. If you don’t know how to do that, you certainly don’t know how to create an AI that will conditionally respond to an environment of loving parents by growing up into a kindly superintelligence. If you have something that just maximizes the number of paperclips in its future light cone, and you raise it with loving parents, it’s still going to come out as a paperclip maximizer. There is not that within it that would call forth the conditional response of a human child. Kindness is not sneezed into an AI by miraculous contagion from its programmers. Even if you wanted a conditional response, that conditionality is a fact you would have to deliberately choose about the design.
Yes, there’s certain information you have to get from the environment—but it’s not sneezed in, it’s not imprinted, it’s not absorbed by magical contagion. Structuring that conditional response to the environment, so that the AI ends up in the desired state, is itself the major problem.

12.5.2 The uncontrollable real world is part of the “training environment” too

RL practitioners are used to the idea that they get to freely choose the training environment. But for brain-like AGI, that frame is misleading, because of continual learning (§8.2.2). The programmers get to choose the “childhood environment”, so to speak, but sooner or later, brain-like AGI will wind up in the real world, doing what it thinks is best by its own lights.

By analogy, generations of parents have tried to sculpt their children’s behavior, and they’re often successful, as long as the kid is still living with them and under their close supervision. Then the children become adults, living for years in a distant city, and the children often find that very different behaviors and beliefs fit them better, whether the parents like it or not.

Again see “Heritability, Behaviorism, and Within-Lifetime RL” (2023) for further discussion of this point, plus “Heritability: Five Battles” (2025), §2 for nitpicky details and caveats.

By the same token, programmers can build a “childhood environment” for their baby brain-like AGIs to hang out and grow up in, but sooner or later we’ll need an AGI that remains aligned while roaming free in the real world. If we can’t do that, careless people will still make brain-like AGIs that roam free in the real world, but they will all be misaligned. That’s bad.

And while we can design the childhood environment, we can’t control the real world. It is what it is.

Indeed, I think a solid starting-point mental model is that, if we want a brain-like AGI that’s aligned after being in the real world for a long time, then we should mostly forget about the childhood environment. Thanks to continual learning, the AGI will settle upon patterns of thought and behavior that it finds most suitable in the real world, which depends on its innate disposition (e.g. reward function), and on the details of the real world, but not on its childhood environment.

12.5.3 But “childhood environment” does matter

…But that model is just a starting point, and is indeed oversimplified.

Start with the human case (again see “Heritability: Five Battles” (2025), §2 for all the nitpicky caveats). People who grow up in radically different cultures, religions, etc., wind up with systematically different ideas about what makes a good and ethical life. For more extreme examples than that, see feral children, this horrifying Romanian orphanage story, and so on.

Snapshot from the table of contents of the Wikipedia article on feral children. I found it amusing at first, but actually it’s horrifying.

Maybe given infinite time and familiarity with different cultures, people would settle into whatever culture best fit their disposition, and childhood would again be irrelevant. But time is not infinite; indeed, I expect AGI in particular to take irreversible actions quickly. Relatedly, AGI may have more options for permanent self-modification than humans do—it’s easier to manipulate code than brains—so there’s more opportunity for childhood desires to get locked in (see “Perils of under- vs over-sculpting AGI desires” (2025), §5). Finally, the “childhood” of AGIs may be very far outside the human distribution.

So it’s actually well worth thinking about childhood environment as an intervention point.

12.5.4 So at the end of the day, how should we handle childhood environment?

For a relatively thoughtful take on the side of “we need to raise the AGI in a loving human family”, see the paper “Anthropomorphic reasoning about neuromorphic AGI safety” (Jilk et al., 2017). Incidentally, I find that paper generally quite reasonable, and largely consistent with what I’m saying in this series. For example, when they say things like “basic drives are pre-conceptual and pre-linguistic”, I think they have in mind a similar picture as my Post #3.

On page 9 of that paper, there’s a three-paragraph discussion along the lines of “let’s raise our AGI in a loving human family”. They’re not being as naïve as the people Eliezer & Rob & I were criticizing in §12.5.1 above: the authors here are proposing to raise the AGI in a loving human family after reverse-engineering human social instincts and installing them in the AGI.

What do I think? The responsible answer is: It’s premature to speculate. Jilk et al. and I are in agreement that the first step is to reverse-engineer human social instincts. Once we have a better understanding of what’s going on, then we can have a more informed discussion of what the life experience should look like.

However, I’m irresponsible, so I’ll speculate anyway.

It does indeed seem to me that raising the AGI in a loving human family would probably work, as a life experience approach. But I’m a bit skeptical that it’s necessary, or that it’s practical, or that it’s optimal.

(Before I proceed, I need to mention a background belief: I think I’m unusually inclined to emphasize the importance of “social learning by watching people”, compared to “social learning by interacting with people”. I don’t imagine that the latter can be omitted entirely—just that maybe it can be the icing on the cake, instead of the bulk of the learning. See footnote for why I think that.^[3] Note that this belief is different from saying that social learning is “passive”: if I’m watching from the sidelines, as someone does something, I can still actively decide what to pay attention to, and I can actively try to anticipate their actions before they happen, and I can actively practice or reenact what they did, on my own time, etc.)

Start with the practicality aspects of “raising an AGI in a loving human family”. I expect that brain-like AGI algorithms will think and learn much faster than humans. Remember, we’re working with silicon chips that operate ≈10,000,000× faster than human neurons.^[4] That means even if we’re a whopping 10,000× less skillful at parallelizing brain algorithms than the brain itself, we’d still be able to simulate a brain at 1000× speedup, e.g. a 1-week calculation that has the equivalent of 20 years of life experience. (Note: The actual speedup could be much lower, or even higher, it’s hard to say; see more detailed discussion in my post “Thoughts on hardware / compute requirements for AGI” (2023).) Now, if 1000× speedup is what the technology can handle, but we start demanding that the training procedure have thousands of hours of real-time, back-and-forth interaction between the AGI and a human, then that interaction would dominate the training time.^[5] (And remember, we may need many iterations of training until we actually get an AGI.) So we could wind up in an unfortunate situation where the teams trying to raise their AGIs in a loving human family would be at a strong competitive disadvantage compared to the teams that have convinced themselves (rightly or wrongly) that doing so is unnecessary. Thus, if there’s any way to eliminate or minimize the real-time, back-and-forth interaction with humans, while maintaining the end-result of an AGI with prosocial motivations, we should be striving to find it.

Is there a better way? Well, as I mentioned above, maybe we can mostly rely on “social learning by watching people”, instead of “social learning by interacting with people”. If so, maybe the AGI can just watch YouTube videos! Videos can be sped up, and thus we avoid the competitiveness concern of the preceding paragraph. Also, importantly, videos can be tagged with human-provided ground-truth labels. In a “controlled AGI” context, we could (for example) give the AGI a reward signal when it’s attending to a character who is happy, thus instilling in the AGI a desire for people to be happy. (Yeah I know that sounds stupid—more discussion in Post #14.) In the “social-instinct AGI” context, maybe videos can be tagged with which characters are or aren’t admiration-worthy. (Details in footnote.^[6])

I don’t know if that would really work, but I think we should have an open mind to non-human-like possibilities of this sort.

Changelog

July 2024: Since the initial version, I’ve made only minor changes, particularly adding links where appropriate to things that I wrote later on.

January 2026: Various edits and updates, especially §12.4.3 (updated the discussion per my improved understanding of social instincts), §12.4.7 (shorten & simplify), and a new §12.5.2 (emphasizing the difference between “training environment” and “childhood environment”).

^
No relation to the “AI control” research agenda mentioned in §11.3.1, related to boxing. Sorry for the naming clash.
^
The diagram here is a “default” brain-like AGI, in the sense that I depict two main ingredients leading to the AGI’s goals, but maybe future programmers will include other ingredients as well.
^
My impression is that western educated industrialized culture is generally much more into “teaching by explicit instruction and feedback” than most cultures at most times, and that people often go overboard in assuming that this explicit teaching & feedback is essential, even in situations where it’s not. See Lancy, Anthropology of Childhood, pp. 168–174 and 205–212. (“It’s hard to conclude other than that active or direct teaching/instruction is rare in cultural transmission, and that when it occurs, it is not aimed at critical subsistence and survival skills – the area most obviously affected by natural selection – but, rather, at controlling and managing the child’s behavior.”) (And note that “controlling and managing the child’s behavior” seems to have little overlap with “reinforce how we want them to behave as adults”, if I understand correctly.) Some of the relevant quotes are here.
^
For example, silicon chips might have a clock rate of 2 GHz (i.e. switching every 0.5 nanoseconds), whereas my low-confidence impression is that most neuron operations (with some exceptions) involve a time accuracy of maybe 5 milliseconds.
^
It’s not quite as bad as it sounds, if the interactions can be parallelized. For example, 10 copies of the AGI could chat with 10 different humans for an hour in parallel, and the synaptic edits from those 10 experiences could presumably be merged together at the end of the hour. This might not teach the AGI quite as much as having 10 hour-long conversations serially, but it might be close, especially if the conversations were on different topics. Ditto with other aspects of real-world learning.
^
As discussed in my post Valence & Liking / Admiring, when you’re watching or thinking about a person that you like / admire, then you’re liable to like what they do, imitate what they do, and adopt their values. Conversely, when you’re watching or thinking about a person that you think of as annoying and bad, you’re not liable to imitate them; maybe you even deliberately act unlike them. As discussed in that post, I think this imitating behavior is substantially (though not entirely) due to an innate mechanism that I call “the drive to feel liked / admired”, and that it’s modulated by the valence that my brain assigns to the person I’m watching.
If I’m raising a child, I don’t have much choice in the matter—I hope that my child likes / admires me, his loving parent, and I hope that my child does not like / admire the kid in his class with failing grades and a penchant for violent crime. But it could very well wind up being the opposite. Especially when he’s a teen. But in the AGI case, maybe we don’t have to leave it to chance! Maybe we can just pick the people whom we or or don’t want the AGI to like / admire, and adjust the valence on those people in the AGI’s world-model to make that happen.

What links here?