I don’t mean to imply that subagents are totally separate entities. At the very least they all can access many shared facts and experiences.
And I don’t think that reuse of subcomponents is mutually exclusive from the mechanisms I described. In fact, you could see my mechanisms as attempts to figure out which subcomponents are used for coordination. (E.g. if a bunch of subagents are voting/bargaining over which goal to pursue, probably the goal that they land on will be one that’s pretty comprehensible to most of them.)
Re shards: there are a bunch of similarities. But it seemed to me that shard theory was focused on pretty simple subagents. E.g. from the original post: “Human values are … sets of contextually activated heuristics”; and later “human values are implemented by contextually activated circuits which activate in situations downstream of past reinforcement so as to steer decision-making towards the objects of past reinforcement”.
Whereas I think of many human values as being constituted by subagents that are far too complex to be described in that way. In my view, many important subagents are sophisticated enough that basically any description you give of them would also have to be a description of a whole human (e.g. if you wouldn’t describe a human as a “contextually activated circuit”, then you shouldn’t describe subagents that way).
This may just be a vibes difference; many roads lead to Rome. But the research directions I’ve laid out above are very distinct from the ones that shard theory people are working on.
I don’t have a good story for how reuse of subcomponents leads to cooperation across agents, but my gut says to look at reuse rather than voting or markets. Could be totally baseless though.
Thanks for the comment! A few replies:
I don’t mean to imply that subagents are totally separate entities. At the very least they all can access many shared facts and experiences.
And I don’t think that reuse of subcomponents is mutually exclusive from the mechanisms I described. In fact, you could see my mechanisms as attempts to figure out which subcomponents are used for coordination. (E.g. if a bunch of subagents are voting/bargaining over which goal to pursue, probably the goal that they land on will be one that’s pretty comprehensible to most of them.)
Re shards: there are a bunch of similarities. But it seemed to me that shard theory was focused on pretty simple subagents. E.g. from the original post: “Human values are … sets of contextually activated heuristics”; and later “human values are implemented by contextually activated circuits which activate in situations downstream of past reinforcement so as to steer decision-making towards the objects of past reinforcement”.
Whereas I think of many human values as being constituted by subagents that are far too complex to be described in that way. In my view, many important subagents are sophisticated enough that basically any description you give of them would also have to be a description of a whole human (e.g. if you wouldn’t describe a human as a “contextually activated circuit”, then you shouldn’t describe subagents that way).
This may just be a vibes difference; many roads lead to Rome. But the research directions I’ve laid out above are very distinct from the ones that shard theory people are working on.
EDIT: more on shards here.
Thank you for the reply. You might be interested in neural Darwinism if you’ve never heard of it, the comment you linked in the edit made me think of it: https://en.wikipedia.org/wiki/Neural_Darwinism.
I don’t have a good story for how reuse of subcomponents leads to cooperation across agents, but my gut says to look at reuse rather than voting or markets. Could be totally baseless though.