This is really interesting. It’s hard to speak too definitively about theories of human values, but for what it’s worth these ideas do pass my intuitive smell test.
One intriguing aspect is that, assuming I’ve followed correctly, this theory aims to unify different cognitive concepts in a way that might be testable:
On the one hand, it seems to suggest a path to generalizing circuits-typework to the model-based RL paradigm. (With shards, which bid for outcomes on a contextually activated basis, being analogous to circuits, which contribute to prediction probabilities on a contextually activated basis.)
On the other hand, it also seems to generalize the psychological concept of classical conditioning (Pavlov’s salivating dog, etc.), which has tended to be studied over the short term for practical reasons, to arbitrarily (?) longer planning horizons. The discussion of learning in babies also puts one in mind of the unfortunate Little Albert Experiment, done in the 1920s:
For the experiment proper, by which point Albert was 11 months old, he was put on a mattress on a table in the middle of a room. A white laboratory rat was placed near Albert and he was allowed to play with it. At this point, Watson and Rayner made a loud sound behind Albert’s back by striking a suspended steel bar with a hammer each time the baby touched the rat. Albert responded to the noise by crying and showing fear. After several such pairings of the two stimuli, Albert was presented with only the rat. Upon seeing the rat, Albert became very distressed, crying and crawling away.
[...]
In further experiments, Little Albert seemed to generalize his response to the white rat. He became distressed at the sight of several other furry objects, such as a rabbit, a furry dog, and a seal-skin coat, and even a Santa Claus mask with white cotton balls in the beard.
A couple more random thoughts on stories one could tell through the lens of shard theory:
As we age, if all goes well, we develop shards with longer planning horizons. Planning over longer horizons requires more cognitive capacity (all else equal), and long-horizon shards do seem to have some ability to either reinforce or dampen the influence of shorter-horizon shards. This is part of the continuing process of “internally aligning” a human mind.
Introspectively, I think there is also an energy cost involved in switching between “active” shards. Software developers understand this as context-switching, actively dislike it, and evolve strategies to minimize it in their daily work. I suspect a lot of the biases you might categorize under “resistance to change” (projection bias, sunk cost fallacy and so on) have this as a factor.
I do have a question about your claim that shards are not full subagents. I understand that in general different shards will share parameters over their world-model, so in that sense they aren’t fully distinct — is this all you mean? Or are you arguing that even a very complicated shard with a long planning horizon (e.g., “earn money in the stock market” or some such) isn’t agentic by some definition?
I do have a question about your claim that shards are not full subagents. I understand that in general different shards will share parameters over their world-model, so in that sense they aren’t fully distinct — is this all you mean? Or are you arguing that even a very complicated shard with a long planning horizon (e.g., “earn money in the stock market” or some such) isn’t agentic by some definition?
I currently guess that even the most advanced shards won’t have private world-models which they can query in relative isolation from the rest of the shard economy. Importantly, I didn’t want the reader to think that we’re positing a bunch of homunculi. Maybe I should have just written that.
But I also feel relatively ignorant more advanced shard dynamics. While I can give interesting speculation, I don’t have enough evidence-fuel to make such stories actually knowably correct.
I currently guess that even the most advanced shards won’t have private world-models which they can query in relative isolation from the rest of the shard economy.
What’s your take on “parts work” techniques like IDC, IFS, etc. seeming to bring up something like private (or at least not completely shared) world models? Do you consider the kinds of “parts” those access as being distinct from shards?
I would find it plausible to assume by default that shards have something like differing world models since we know from cognitive psychology that e.g. different emotional states tend to activate similar memories (easier to remember negative things about your life when you’re upset than if you are happy), and different emotional states tend to activate different shards.
The proposal is that humans make choices based on subjective value [...] by perceiving a possible option and then retrieving memories which carry information about the value of that option. For instance, when deciding between an apple and a chocolate bar, someone might recall how apples and chocolate bars have tasted in the past, how they felt after eating them, what kinds of associations they have about the healthiness of apples vs. chocolate, any other emotional associations they might have (such as fond memories of their grandmother’s apple pie) and so on.
Shadlen & Shohamy further hypothesize that the reason why the decision process seems to take time is that different pieces of relevant information are found in physically disparate memory networks and neuronal sites. Access from the memory networks to the evidence accumulator neurons is physically bottlenecked by a limited number of “pipes”. Thus, a number of different memory networks need to take turns in accessing the pipe, causing a serial delay in the evidence accumulation process.
Under that view, I think that shards would effectively have separate world models, since each physically separate memory network suggesting that an action is good or bad is effectively its own shard; and since a memory network is a miniature world model, there’s a sense in which shards are nothing but separate world models.
E.g. the memory of “licking the juice tasted sweet” is a miniature world model according to which licking the juice lets you taste something sweet, and is also a shard. (Or at least it forms an important component of a shard.) That miniature world model is separate from the shard/memory network/world model holding instances of times when adults taught the child to say “thank you” when given something; the latter shard only has a world model of situations where you’re expected to say “thank you”, and no world model of the consequences of licking juice.
This is really interesting. It’s hard to speak too definitively about theories of human values, but for what it’s worth these ideas do pass my intuitive smell test.
One intriguing aspect is that, assuming I’ve followed correctly, this theory aims to unify different cognitive concepts in a way that might be testable:
On the one hand, it seems to suggest a path to generalizing circuits-type work to the model-based RL paradigm. (With shards, which bid for outcomes on a contextually activated basis, being analogous to circuits, which contribute to prediction probabilities on a contextually activated basis.)
On the other hand, it also seems to generalize the psychological concept of classical conditioning (Pavlov’s salivating dog, etc.), which has tended to be studied over the short term for practical reasons, to arbitrarily (?) longer planning horizons. The discussion of learning in babies also puts one in mind of the unfortunate Little Albert Experiment, done in the 1920s:
A couple more random thoughts on stories one could tell through the lens of shard theory:
As we age, if all goes well, we develop shards with longer planning horizons. Planning over longer horizons requires more cognitive capacity (all else equal), and long-horizon shards do seem to have some ability to either reinforce or dampen the influence of shorter-horizon shards. This is part of the continuing process of “internally aligning” a human mind.
Introspectively, I think there is also an energy cost involved in switching between “active” shards. Software developers understand this as context-switching, actively dislike it, and evolve strategies to minimize it in their daily work. I suspect a lot of the biases you might categorize under “resistance to change” (projection bias, sunk cost fallacy and so on) have this as a factor.
I do have a question about your claim that shards are not full subagents. I understand that in general different shards will share parameters over their world-model, so in that sense they aren’t fully distinct — is this all you mean? Or are you arguing that even a very complicated shard with a long planning horizon (e.g., “earn money in the stock market” or some such) isn’t agentic by some definition?
Anyway, great post. Looking forward to more.
I currently guess that even the most advanced shards won’t have private world-models which they can query in relative isolation from the rest of the shard economy. Importantly, I didn’t want the reader to think that we’re positing a bunch of homunculi. Maybe I should have just written that.
But I also feel relatively ignorant more advanced shard dynamics. While I can give interesting speculation, I don’t have enough evidence-fuel to make such stories actually knowably correct.
What’s your take on “parts work” techniques like IDC, IFS, etc. seeming to bring up something like private (or at least not completely shared) world models? Do you consider the kinds of “parts” those access as being distinct from shards?
I would find it plausible to assume by default that shards have something like differing world models since we know from cognitive psychology that e.g. different emotional states tend to activate similar memories (easier to remember negative things about your life when you’re upset than if you are happy), and different emotional states tend to activate different shards.
I suspect that something like the Shadlen & Shohamy take on decision-making might be going on:
Under that view, I think that shards would effectively have separate world models, since each physically separate memory network suggesting that an action is good or bad is effectively its own shard; and since a memory network is a miniature world model, there’s a sense in which shards are nothing but separate world models.
E.g. the memory of “licking the juice tasted sweet” is a miniature world model according to which licking the juice lets you taste something sweet, and is also a shard. (Or at least it forms an important component of a shard.) That miniature world model is separate from the shard/memory network/world model holding instances of times when adults taught the child to say “thank you” when given something; the latter shard only has a world model of situations where you’re expected to say “thank you”, and no world model of the consequences of licking juice.
Got it. That makes sense, thanks!