Jonas Hallgren comments on Foom & Doom 2: Technical alignment is hard

Jonas Hallgren 24 Jun 2025 12:43 UTC
3 points
0
This is quite specific and only engaging with section 2.3 but it made me curious.
I want to ask a question around a core assumption in your argument about human imitative learning. You claim that when humans imitate, this “always ultimately arises from RL reward signals”—that we imitate because we “want to,” even if unconsciously. Is this the case at all times though?
Let me work through object permanence as a concrete case study. The standard developmental timeline shows infants acquiring this ability around 8-12 months through gradual exposure in cultural environments where adults consistently treat objects as permanent entities. What’s interesting is that this doesn’t look like reward-based learning—infants aren’t choosing to learn object permanence because it’s instrumentally useful. Instead, the acquisition pattern in A-not-B error studies suggests (best meta study I could find, I’m taking the concept from the Cognitive Gadgets book) they’re absorbing it through repeated exposure to cultural practices that embed object permanence as a basic assumption.
This raises a broader question about the mechanism. When we look at how language acquisition works, we see similar patterns—children pick up not just vocabulary but implicit cultural assumptions embedded in linguistic practices. The grammar carries cultural logic about agency, causation, social relations. Could object permanence be working the same way?
Heyes’ cognitive gadgets framework suggests this might be quite general. Rather than most cultural learning happening through explicit reward-optimization, maybe significant portions happen through what she calls “direct cultural transmission”—absorption of cognitive tools that are latent in the cultural environment itself.
This would have implications for your argument about prosocial behavior. If prosociality gets transmitted through the same mechanism as object permanence—absorbed from environments where it’s simply the default assumption rather than learned through reward signals—then the “green slice” of genuinely prosocial behavior might be more robust than RL-based accounts would predict.
The key empirical question seems to be: can we distinguish between “learning through rewards” and “absorbing through cultural immersion”? And if so, which mechanism accounts for more of human social development? And does this even matter for your argument? (Maybe there’s stuff around the striatum and the core control loop in the brain still being activated for the learning of cultural information on a more mechanistic level that I’m not thinking of here based on your Brain-Like AGI sequence?)
(I was going to include a bunch more literature stuff on this but I’m sure you can find stuff using deep research and that it will be more relevant to questions you might have.)
- Steven Byrnes 24 Jun 2025 18:11 UTC
  8 points
  0
  Parent
  Thanks! It’s a bit hard for me to engage with this comment, because I’m very skeptical about tons of claims that are widely accepted by developmental psychologists, and you’re not.
  So for example, I haven’t read your references, but I’m immediately skeptical of the claim that the cause of kids learning object permanence is “gradual exposure in cultural environments where adults consistently treat objects as permanent entities”. If people say that, what evidence could they have? Have any children been raised in cultural environments where adults don’t treat objects as permanent entities? Or what?
  (There’s a study that finds that baby chicks display behavior typical of object permanence with no exposure to any other animal, and indeed no exposure to situations where object permanence was even a good way to make predictions! I wrote about it last year at Woods’ new preprint on object permanence.)
  Also, putting that aside, “infants aren’t choosing to learn [blah] because it’s instrumentally useful” is different from what I was talking about. My claim is that “humans imitate other humans because they want to”. Now,
  - One reason that I might want to imitate you is because you show me how to do something that I had a preexisting desire to do.
    For example, I want an apple, but I don’t know where to find them, and then I see you getting an apple out of the cabinet, and then I go get an apple out of the same cabinet.
  - Another reason that I might want to imitate you is because I admire you, and so whatever you want to do, suddenly feels to me like a good idea, just by the very fact that you want to do it.
    For example, if all the cool kids in school start skateboarding, then I’m probably gonna start thinking that skateboarding is cool, and I will feel some desire to start skateboarding myself.
  The second one involves human social instincts. Human social instincts can lead directly to new desires, just as hunger can, including a desire to imitate (in certain cases). I’ve written about it a bit here and here, and hopefully I’ll have a better discussion in the near future.
  If prosociality gets transmitted through the same mechanism as object permanence—absorbed from environments where it’s simply the default assumption rather than learned through reward signals—then the “green slice” of genuinely prosocial behavior might be more robust than RL-based accounts would predict.
  There is obviously no culture on Earth where people are kind and honest because it has simply never occurred to any of them that they could instead be mean or dishonest. So prosociality cannot be a “default assumption”. Instead, it’s a choice that people make every time they interact with someone, and they’ll make that choice based on their all-things-considered desires. Right? Sorry if I’m misunderstanding.
  - Jonas Hallgren 24 Jun 2025 19:51 UTC
    1 point
    0
    Parent
    I will fold on the general point here, it is mostly the case that it doens’t matter and the motivations come from the steering sub-system anyhow and that as a consqeuence it is ounfdationally different from how LLMs learn.
    There is obviously no culture on Earth where people are kind and honest because it has simply never occurred to any of them that they could instead be mean or dishonest. So prosociality cannot be a “default assumption”. Instead, it’s a choice that people make every time they interact with someone, and they’ll make that choice based on their all-things-considered desires. Right? Sorry if I’m misunderstanding.
    I’m however not certain if I agree with this point, if your in a fully cooperative game, is it your choice that you choose to cooperate? If you’re an agent who uses functional or evidential decision theory and you choose to cooperate with your self in a black box prisoner’s dilemma is that really a choice then?
    Like your initial imitations shape your steering system to some extent and so there could be culturally learnt social drives no? I think culture might be conditioning the intial states of your learning environment and that still might be an important part of how social drives are generated?
    I hope that makes sense and I apologise if it doesn’t.