That’s smart! When I started graduate school in psychology in 2013, mirror neurons felt like, colloquially, “hot shit”, but within a few years, people had started to cringe quite dramatically whenever the phrase was used. I think your reasoning in (3) is spot on.
Your example leads to fun questions like, “how do I recognize juggling”, including “what stimuli activate the concept of juggling when I do it” vs “what stimuli activate the concept of juggling when I see you do it”?, and intuitively, nothing there seems to require that those be the same neurons, except the concept of juggling itself.
Empirically I would probably expect to see a substantial overlap in motor and/or somatosensory areas. One could imagine the activation pathway there is something like
visual cortex [see juggling]->temporal cortex [concept of juggling]->motor cortex[intuitions of moving arms]
And we’d also expect to see some kind of direct “I see you move your arm in x formation”->”I activate my own processes related to moving my arm in x formation” that bypasses the temporal cortex altogether.
And we could probably come up with more pathways that all cumulatively produce “mirror neural activity” which activates both when I see you do a thing and when I do that same thing. Maybe that’s a better concept/name?
Then the next thing I want to suggest is that the system uses human resolution of conflicting outcomes to train itself to predict how a human would resolve a conflict, and if it is higher than a suitable level of confidence, it will go ahead and act without human intervention. But any prediction of what a human would predict could be second-guessed by a human pointing out where the prediction is wrong.
Agreed that whether a human understanding the plan (and all the relevant outcomes. which outcomes are relevant?) is important and harder than I first imagined.
You haven’t factored in the possibility Putin gets deposed by forces inside Russia who might be worried about a nuclear war and conditional on use of tactical nukes, intuitively that seems likely enough to materially lower p(kaboom).
American Academy of Pediatrics lies to us once again....“If caregivers are wearing masks, does that harm kids’ language development? No. There is no evidence of this. And we know even visually impaired children develop speech and language at the same rate as their peers.”This is a textbook case of the Law of No Evidence. Or it would be, if there wasn’t any Proper Scientific Evidence.
American Academy of Pediatrics lies to us once again....
“If caregivers are wearing masks, does that harm kids’ language development? No. There is no evidence of this. And we know even visually impaired children develop speech and language at the same rate as their peers.”This is a textbook case of the Law of No Evidence. Or it would be, if there wasn’t any Proper Scientific Evidence.
Is it, though? I’m no expert, but I tried to find Relevant Literature. Sometimes, counterintuitive things are true.
Blindness affects congenitally blind children’s development in different ways, language development being one of the areas less affected by the lack of vision.Most researchers have agreed upon the fact that blind children’s morphological development, with the exception of personal and possessive pronouns, is not delayed nor impaired in comparison to that of sighted children, although it is different.As for syntactic development, comparisons of MLU scores throughout development indicate that blind children are not delayed when compared to sighted childrenBlind children use language with similar functions, and learn to perform these functions at the same age as sighted children. Nevertheless, some differences exist up until 4;6 years; these are connected to the adaptive strategies that blind children put into practice, and/or to their limited access to information about external reality. However these differences disappear with time (Pérez-Pereira & Castro, 1997). The main early difference is that blind children tend to use self-oriented language instead of externally oriented language.
Blindness affects congenitally blind children’s development in different ways, language development being one of the areas less affected by the lack of vision.
Most researchers have agreed upon the fact that blind children’s morphological development, with the exception of personal and possessive pronouns, is not delayed nor impaired in comparison to that of sighted children, although it is different.As for syntactic development, comparisons of MLU scores throughout development indicate that blind children are not delayed when compared to sighted childrenBlind children use language with similar functions, and learn to perform these functions at the same age as sighted children. Nevertheless, some differences exist up until 4;6 years; these are connected to the adaptive strategies that blind children put into practice, and/or to their limited access to information about external reality. However these differences disappear with time (Pérez-Pereira & Castro, 1997). The main early difference is that blind children tend to use self-oriented language instead of externally oriented language.
I don’t know exactly where that leaves us evidentially. Perhaps the AAP is lying by omission by not telling us about things other than language that are affected by children’s sight.
That’s a bit different to the dishonesty alleged, though.
Still working my way through reading this series—it is the best thing I have read in quite a while and I’m very grateful you wrote it!
I feel like I agree with your take on “little glimpses of empathy” 100%.
I think fear of strangers could be implemented without a steering subsystem circuit maybe? (Should say up front I don’t know more about developmental psychology/neuroscience than you do, but here’s my 2c anyway). Put aside whether there’s another more basic steering subsystem circuit for agency detection; we know that pretty early on, through some combination of instinct and learning from scratch, young humans and many animals learn there are agents in the world who move in ways that don’t conform to the simple rules of physics they are learning. These agents seem to have internally driven and unpredictable behavior, in the sense their movement can’t be predicted by simple rules like “objects tend to move to the ground unless something stops them” or “objects continue to maintain their momentum”. It seems like a young human could learn an awful lot of that from scratch, and even develop (in their thought generator) a concept of an agent.
Because of their unpredictability, agent concepts in the thought generator would be linked to thought assessor systems related to both reward and fear; not necessarily from prior learning derived from specific rewarding and fearful experiences, but simply because, as their behavior can’t be predicted with intuitive physics, there remains a very wide prior on what will happen when an agent is present.
In that sense, when a neocortex is first formed, most things in the world are unpredictable to it, and an optimally tuned thought generator+assessor would keep circuits active for both reward or harm. Over time, as the thought generator learns folk physics, most physical objects can be predicted, and it typically generates thoughts in line with their actual beahavior. But agents are a real wildcard: their behavior can’t be predicted by folk physics, and so they perceived in a way that every other object in the world used to be: unpredictable, and thus continually predicting both reward and harm in an opponent process that leads to an ambivalent and uneasy neutral. This story predicts that individual differences in reward and threat sensitivity would particularly govern the default reward/threat balance otherwise unknown items. It might (I’m really REALLY reaching here) help to explain why attachment styles seem so fundamentally tied to basic reward and threat sensitivity.
As the thought generator forms more concepts about agents, it might even learn that agents can be classified with remarkable predictive power into “friend” or “foe” categories, or perhaps “mommy/carer” and “predator” categories. As a consequence of how rocks behave (with complete indifference towards small children), it’s not so easy to predict behavior of, say, falling rocks with “friend” or “foe” categories. On the contrary, agents around a child are often not indifferent to children, making it simple for the child to predict whether favorable things will happen around any particular agent by classifying agents into “carer” or “predator” categories. These categories can be entirely learned; clusters of neurons in the thought generator that connect to reward and threat systems in the steering system and/or thought assessor. So then the primary task of learning to predict agents is simply whether good things or bad things happen around the agent, as judged by the steering system.
This story would also predict that, before the predictive power of categorizing agents into “friend” vs. “foe” categories has been learned, children wouldn’t know to place agents into these categories. They’d take longer to learn whether an agent is trustworthy or not, particularly so if they haven’t learned what an agent is yet. As they grow older, they get more comfortable with classifying agents into “friend” or “foe” categories and would need fewer exemplars to learn to trust (or distrust!) a particular agent.
Event is on tonight as planned at 7. If you’re coming, looking forward to seeing you!
I wrote a paper on another experiment by Berridge reported in Zhang & Berridge (2009). Similar behavior was observed in that experiment, but the question explored was a bit different. They reported a behavioral pattern in which rats typically found moderately salty solutions appetitive and very salty solutions aversive. Put into salt deprivation, rats then found both solutions appetitive, but the salty solution less so.
They (and we) took it as given that homeostatic regulation set a ‘present value’ for salt that was dependent on the organism’s current state. However, in that model, you would think rats would most prefer the extremely salty solution. But in any state, they prefer the moderately salty solution.
In a CABN paper, we pointed out this is not explainable when salt value is determined by a single homeostatic signal, but is explainable when neuroscience about the multiple salt-related homeostatic signals is taken into account. Some fairly recent neuroscience by Oka & Lee (and some older stuff too!) is very clear about the multiple sets of pathways involved. Because there are multiple regulatory systems for salt balance, the present value of these can be summed (as in your “multi-dimensional rewards” post) to get a single value signal that tracks the motivation level of the rat for the rewards involved.
Hey Steve, I am reading through this series now and am really enjoying it! Your work is incredibly original and wide-ranging as far as I can see—it’s impressive how many different topics you have synthesized.
I have one question on this post—maybe doesn’t rise above the level of ‘nitpick’, I’m not sure. You mention a “curiosity drive” and other Category A things that the “Steering Subsystem needs to do in order to get general intelligence”. You’ve also identified the human Steering Subsystem as the hypothalamus and brain stem.
Is it possible things like a “curiosity drive” arises from, say, the way the telenchephalon is organized, rather than from the Steering Subsystem itself? To put it another way, if the curiosity drive is mainly implemented as motivation to reduce prediction error, or fill the the neocortex, how confident are you in identifying this process with the hypothalamus+brain stem?
I think I imagine the way in which I buy the argument is something like “steering system ultimately provides all rewards and that would include reward from prediction error”. But then I wonder if you’re implying some greater role for the hypothalamus+brain stem or not.
Very late to the party here. I don’t know how much of the thinking in this post you still endorse or are still interested in. But this was a nice read. I wanted to add a few things:
- since you wrote this piece back in 2021, I have learned there is a whole mini-field of computer science dealing with multi-objective reward learning, maybe centered around . Maybe a good place to start there is https://link.springer.com/article/10.1007/s10458-022-09552-y
- The shard theory folks have done a fairly good job sketching out broad principles but it seems to me the homeostatic regulation does a great job of modulating which values happen to be relevant at any one time—Xavier Roberts-Gaal recently recommended “Where do values come from?” to me and that paper sketches out a fairly specific theory for how this happens (I think it might be that more homeostatic recalculation happens physiologically rather than neurologically, but otherwise buy what they are saying)
- Continue to think the vmPFC is relevant because different parts are known to calculate value of different aspects of stimuli; this can be modulated by state from time to time. a recent paper in this by Luke Chang & colleagues is a neural signature of reward
At this moment in time I have two theories about how shards seem to be able to form consistent and competitive values that don’t always optimize for some ultimate goal:
Overall, Shard theory is developed to describe behavior of human agents whose inputs and outputs are multi-faceted. I think something about this structure might facilitate the development of shards in many different directions. This seems different to modern deep RL agent; although they also potentially can have lots of input and output nodes, these are pretty finely honed to achieve a fairly narrow goal, and so in a sense, it is not too much of a surprise they seem to Goodhart on the goals they are given at times. In contrast, there’s no single terminal value or single primary reinforcer in the human RL system: sugary foods score reward points, but so do salty foods when the brain’s subfornical region indicates there’s not enough sodium in the bloodstream (Oka, Ye, Zuker, 2015); water consumption also gets reward points when there’s not enough water. So you have parallel sets of reinforcement developing from a wide set of primary reinforcers all at the same time.
As far as I know, a typical deep RL agent is structured hierarchically, with feedforward connections from inputs at one end to outputs at the other, and connections throughout the system reinforced with backpropagation. The brain doesn’t use backpropagation (though maybe it has similar or analogous processes); it seems to “reward” successful (in terms of prediction error reduction, or temporal/spatial association, or simply firing at the same time...?) connections throughout the neocortex, without those connections necessarily having to propagate backwards from some primary reinforcer.
The point about being better at credit assignment as you get older is probably not too much of a concern. It’s very high level, and to the extent it is true, mostly attributable to a more sophisticated world model. If you put a 40 year old and an 18 year old into a credit assignment game in a novel computer game environment, I doubt the 40 year old will do better. they might beat a 10 year old, but only to the extent the 40 year old has learned very abstract facts about associations between objects which they can apply to the game. speed it up so that they can’t use system 2 processing, and the 10 year old will probably beat them.
I have pointed this out to folks in the context of AI timelines: metaculus gives predictions for “weakly AGI” but I consider hypothetical GATO-x which can generalize to a task outside it’s training distribution or many tasks outside it’s training distribution to be AGI, yet a considerable way from an AGI with enough agency to act on its own.
OTOH it isn’t so much reassurance if bootstrapping this thing up to agency with as little as a batch script to keep it running will make it agentic.
But the time between weak AGI and agentic AGI is a prime learning opportunity and the lesson is we should do everything we can to prolong the length of the time between them once weak AGI is invented.
Also, perhaps someone should study the necessary components for an AGI takeover by simulating agent behavior in a toy model. At the least you need a degree of agency, probably a self model in order to recursively self-improve, and the ability to generalize. Knowing what the necessary components are might enable us to take steps to avoid having them in once system all at once.
If anyone has ever demonstrated, or even systematically described, what those necessary components are, I haven’t seen it done. Maybe it is an infohazard but it also seems like necessary information to coordinate around.
You mentioned in the pre-print that results were “similar” for the two color temperatures, and referred to the Appendix for more information, but it seems like the Appendix isn’t included in your pre-print. Are you able to elaborate on how similar results in these two conditions were? In my own personal exploration of this area I have put a lot of emphasis on color temperature. Your study makes me adjust down the importance of color temperature, although it would be good to get more information.
A consolidated list of bad or incomplete solutions could have considerable didactic value—it could keep people learn more about the various challenges involved.
Not sure what I was thinking about, but probably just that my understanding is that “safe AGI via AUP” would have to penalize the agent for learning to achieve anything not directly related to the end goal, and that might make it too difficult to actually achieve the end goal when e.g. it turns out to need tangentially related behavior.
Your “social dynamics” section encouraged me to be bolder sharing my own ideas on this forum, and I wrote up some stuff today that I’ll post soon, so thank you for that!
That was an inspiring and enjoyable read!
Can you say why you think AUP is “pointless” for Alignment? It seems to me attaining cautious behavior out of a reward learner might turn out to be helpful. Overall my intuition is it could turn out to be an essential piece of the puzzle.
I can think of one or two reasons myself, but I barely grasp the finer points of AUP as it is, so speculation on my part here might be counterproductive.
I would very much like to see your dataset, as a zotero database or some other format, in order to better orient myself to the space. Are you able to make this available somehow?