Complex Behavior from Simple (Sub)Agents

Epistemic Sta­tus: Si­mul­ta­neously this is work that took me a long time and a lot of thought, and also a playful and highly spec­u­la­tive in­ves­ti­ga­tion. Con­sider tak­ing this se­ri­ously but not liter­ally.


Take a sim­ple agent (GitHub; Python), with no ca­pac­ity for learn­ing, that ex­ists on a 2D plane. It shares the plane with other agents and ob­jects, to be de­scribed shortly.

The agent in­trin­si­cally doesn’t want any­thing. But it can be as­signed goal-like ob­jects, which one might view as sub­agents. Each in­di­vi­d­ual goal-like sub­agent can pos­sess a sim­ple prefer­ence, such as a de­sire to reach a cer­tain re­gion of space, or a de­sire to avoid a cer­tain point.

The goal-like sub­agents can also vary in the de­gree to which they re­main satis­fied. Some might be per­ma­nently satis­fied af­ter achiev­ing their goal once; some might quickly be­come un­satis­fied again af­ter a few timesteps.

Every timestep, the agent con­sid­ers ten ran­dom move­ments of unit-dis­tance, and ex­e­cutes the move­ment cor­re­spond­ing to the high­est ex­pected valence be­ing re­ported by its goal-like sub­agents, in a win­ner-take-all fash­ion.

Even with such an in­ten­tion­ally sim­plis­tic model, a sur­pris­ing and illu­mi­nat­ing level of be­hav­ioral com­plex­ity can arise.

Sec­tions 1-8 con­cern in­ter­est­ing or amus­ing be­hav­iors ex­hibited by the model.

Sec­tions 8-12 out­line fu­ture di­rec­tions for the model and ru­mi­na­tions on hu­man be­hav­ior.

1. Baseline

In this image, the path of the agent is painted with points, the color of the points chang­ing slowly with the pas­sage of time. This agent pos­sesses three sub­agents with prefer­ences for reach­ing the three green cir­cles, and a fourth mild prefer­ence for avoid­ing the red cir­cle.

Once it comes within a set dis­tance of one of the green cir­cles, the cor­re­spond­ing sub­agent is satis­fied, and thus the move­ment with the high­est ex­pected valence switches to the next-high­est valence goal. The satis­fac­tion grad­u­ally wears off, and the agent be­gins to be drawn to the goal again. Thus, the agent moves in­ex­orably around the tri­an­gle of green cir­cles, some­times in a cir­cuit, some­times back­track­ing.

2. “Ugh field”

If the aver­sion to the red cir­cle is am­plified above a cer­tain thresh­old, this be­hav­ior re­sults. The sub­agent with prefer­ences for reach­ing the top green cir­cle still ex­ists, but it will never be satis­fied, be­cause ex­pected nega­tive valence of pass­ing near the red cir­cle is too high.

But if one is clever, one can find a way around aver­sions, by in­vent­ing in­ter­me­di­ary goals or cir­cum­vent­ing the aver­sion with in­ter­me­di­ate de­sir­able states.

Some­times a you want to ac­com­plish some­thing, but a seem­ingly triv­ial in­con­ve­nience will arise to screen off your mo­ti­va­tion. If you can’t re­move the in­con­ve­nience, you can usu­ally find a path around it.

3. Smartphone

What if the agent has, pinned to its po­si­tion (such that it is con­stantly some­what nearby), a low-valence re­ward­ing ob­ject, which doesn’t provide last­ing satis­fac­tion? (In other words—the agent has a goal-like sub­agent which mildly but rel­a­tively per­sis­tently wants to ap­proach the pinned ob­ject.)

The agent sud­denly looks very dis­tracted, doesn’t it? It doesn’t make the same reg­u­lar pro­duc­tive cir­cuits of its goals. It seems to fre­quently get stuck, sphex­ishly re­turn­ing to a goal that it just ac­com­plished, and to take odd pointless en­ergy-wast­ing zigzags in its path.

Maybe it’s bad to con­stantly carry around at­ten­tion-grab­bing ob­jects that provide us with minis­cule, un­satis­fy­ing hits of pos­i­tive valence.

Con­sid­ered to­gether, Parts 2 and 3 speak to the dan­gers of con­ve­nience and the power of triv­ial in­con­ve­nience. The agents (and hu­mans) are ex­traor­di­nar­ily sen­si­tive not only to the ab­solute valence of an ex­pec­ta­tion, but to the prox­im­ity of that state. Even ob­jec­tively weak sub­agents can mo­ti­vate be­hav­ior if they are un­ceas­ingly pre­sent.

4. Agi­ta­tion and/​or Energy

The model does not ac­tu­ally have any con­cept of en­ergy, but it is straight­for­ward to en­code a prefer­ence for mov­ing around a lot. When the agent is so in­clined, its be­hav­ior be­comes chaotic.

Even a rel­a­tively mod­er­ate prefer­ence for in­creased move­ment will lead to some er­ratic swerves in be­hav­ior.

If one wished, one could map this type of be­hav­ior onto ag­i­ta­tion, or ADHD, or anx­iety, or be­ing overly caf­feinated. On the other hand, you could view some de­gree of “restless­ness” as a drive to­ward ex­plo­ra­tion, with­out which one might never dis­cover new goals.

One path of in­ves­ti­ga­tion that oc­curred to me but which I did not ex­plore was to give the agent a level of move­ment-prefer­ence that waxed and waned cycli­cally over time. Some­times you sub­jec­tively have a lot of willpower, some­times you sub­jec­tively can’t fo­cus on any­thing. But, on the whole, we all man­age to get stuff done.

5. Look-Ahead

I at­tempted to im­ple­ment an abil­ity for the agent to scan ahead more than one step into the fu­ture and take the move­ment cor­re­spond­ing the high­est ex­pected valence in two timesteps, rather than just the next timestep. This didn’t re­ally show any­thing in­ter­est­ing, and re­mains in the cat­e­gory of things that I will con­tinue to look into. (The Red agent is think­ing two moves ahead, the Blue agent only one move ahead. Is there a differ­ence? Is the clus­ter­ing of the Red agent’s pathing slightly tighter? Difficult to say.)

I don’t per­son­ally think hu­mans ex­plic­itly look ahead very of­ten. We give our­selves credit as the “think­ing, plan­ning an­i­mal”, but we gen­er­ally just make whichever choice cor­re­sponds to the high­est ex­pected valence in the cur­rent mo­ment. Look­ing ahead is also very com­pu­ta­tion­ally ex­pen­sive—both for peo­ple, and for these agents—be­cause it in­evitably re­quires some­thing like a model-based tree search. What I think we ac­tu­ally do is bet­ter ad­dressed in Sec­tion 10 re­gard­ing Goal Hier­ar­chies.

6. Sociability

Of course, we can give the agents prefer­ences for be­ing near other agents, obey­ing the same rules as the prefer­ences for any other po­si­tion in space.

With hy­per-dom­i­nant, non-ex­tin­guish­ing prefer­ences for be­ing around other agents, we get this piece of com­puter gen­er­ated art that I call “Lovers”.

With more mod­est prefer­ence for the com­pany of other agents, and with par­tially-over­lap­ping goals (Blue agent wants to spend time around the top and right­most tar­get, Red agent wants to spend time around the top and left­most tar­get) you get this other piece of art that I call “Healthy Friend­ship”. It looks like they’re hav­ing fun, doesn’t it?

7. New Goals Are Disruptive

Brief re­flec­tion should con­firm that in­tro­duc­ing a new goal into your life can be very dis­rup­tive to your ex­ist­ing goals. You could say that per­mit­ting a new goal-like sub­agent to take root in your mind is akin to in­tro­duc­ing a com­peti­tor who will now be bid­ding against all your ex­ist­ing goals for the scarce re­source of your time and at­ten­tion.

Com­pare this image with the Baseline at the top of this ar­ti­cle. The new, pow­er­ful top-right goal has si­phoned away all the at­ten­tion from the formerly sta­ble, well-tended trio of goals.

I think one of the main rea­sons we fall down on our goals is sim­ply that we spon­ta­neously gen­er­ate new goals, and these new goals dis­rupt our ex­ist­ing mo­ti­va­tional pat­terns.

8. Win­ner-Take-All?

You may have more ques­tions about the win­ner-take-all as­sump­tion that I men­tioned above. In this sim­ple model, the goal-like sub­agents do not “team up”. If two sub­agents would pre­fer that the agent move to the left, this does not mean that their as­so­ci­ated valence will sum up and make that choice more globally ap­peal­ing. The rea­son is sim­ple: if you straight­for­wardly sum up over all valences in­stead of pick­ing a win­ner, this is what hap­pens:

The agent sim­ply seeks out a lo­cal min­i­mum and stays there.

I am cur­rently some­what ag­nos­tic as to what the hu­man or an­i­mal brain is ac­tu­ally do­ing. We do ap­pear to get stuck in lo­cal min­ima some­times. But you can get sphex­ish be­hav­ior that looks like a lo­cal min­i­mum out of a par­tic­u­lar ar­range­ment of win­ner-take-all sub­agents. For ex­am­ple, if an agent is hemmed in by aver­sive stim­uli with no suffi­ciently pos­i­tive goal-states nearby, that might look like a lo­cal min­i­mum, though it is still re­act­ing to each aver­sive stim­u­lus in a win­ner-take-all fash­ion.

Sub­jec­tively, though, it feels like if you have two good rea­sons sup­port­ing an ac­tion, that makes the ac­tion feel a bit eas­ier to do, a bit more mo­ti­vat­ing, than if you just had one good rea­son. This hints that maybe goal-like sub­agents can gang up to­gether. But I also doubt that this is any­thing like strictly ad­di­tive. Think­ing of 2,000 rea­sons why I should go to the gym isn’t 2,000 times more com­pel­ling than think­ing of one rea­son.

9. Belief, Bias, and Learning

The main area of the model that I would like to im­prove, but which would am­plify the com­plex­ity of the code tremen­dously, would be in in­tro­duc­ing the con­cept of bias and/​or be­lief. The agent should be able to be wrong about its ex­pected valence. I think this is hugely im­por­tant, ac­tu­ally, and ex­plains a lot about hu­man be­hav­ior.

Patholo­gies arise when we are sys­tem­at­i­cally wrong about how good, or how bad, some fu­ture state will be. But we can over­come patholo­gies by ex­pos­ing our­selves to those states, and be­com­ing deeply cal­ibrated re­gard­ing their re­al­ity. On the aver­sion side this ap­plies to ev­ery­thing from the treat­ment of pho­bias and PTSD, to the proper re­sponse to a rea­son­able-seem­ing anx­iety. On the pos­i­tive-valence side, we may imag­ine that it would be in­cred­ibly cool and awe­some to do or to be some par­tic­u­lar thing, and only ex­pe­rience can show us that ac­com­plish­ing such things yields only a shadow of what we ex­pected. Then your brain up­dates on that, and you cease to feel mo­ti­vated to do that thing any­more. You can no longer sus­tain the delu­sion that it was go­ing to be awe­some.

10. Goal Hierarchies

It seems clear that, in hu­mans, goals are ar­ranged in some­thing like trees: I finish this cur­rent push-up be­cause I want to finish my work­out. I want to finish my work­out be­cause I want to stay on my work­out pro­gram. I want to stay on my work­out pro­gram be­cause I want to be strong and healthy.

But it’s al­most cer­tainly more com­plex than this, and I don’t know how the brain man­ages its “ex­pected valence” calcu­la­tions across lev­els of the tree.

I hy­poth­e­size that it goes some­thing like this. Goal-like sub­agents con­cerned with far-fu­ture out­comes, like “be­ing strong and healthy”, gen­er­ate (or per­haps man­i­fest as) more spe­cific near-term goal-like tar­gets, with ac­com­pa­ny­ing con­crete sen­sory-ex­pec­ta­tion tar­gets, like “work­ing out to­day”. This seems like one of those mostly au­to­matic things that hap­pens whether or not we en­g­ineer it. The au­to­mat­ic­ity of it seems to rely on our maps/​mod­els/​be­liefs about how the world works. Even much sim­pler an­i­mals can chain to­gether and break down goals, in the course of mov­ing across ter­rain to­ward prey, for ex­am­ple.

The model de­scribed above doesn’t re­ally have a world model and can’t learn. I could ar­tifi­cially des­ig­nate some goals as be­ing sub-goals of other goals, but I don’t think this is how it ac­tu­ally works, and I don’t ac­tu­ally think it would yield any more in­ter­est­ing be­hav­ior. But it might be worth look­ing into. Per­haps the most com­pel­ling as­pect of this area is that what would be needed would not be to am­plify the clev­er­ness of the agent; it would be to am­plify the clev­er­ness of the sub­agent in ma­nipu­lat­ing and mak­ing its prefer­ences clearer to the agent. For ex­am­ple: give sub­agents the power to gen­er­ate new goal-ob­jects, and lend part of their own valence to those sub­agents.

11. Suffering

I toyed with the idea of sum­ming up all the valences of the goal-ob­jects that were be­ing ig­nored at any given mo­ment, and call­ing that “suffer­ing”. This sure is what suffer­ing feels like, and it’s akin to what those of a spiritual bent would call suffer­ing. Ba­si­cally, suffer­ing is want­ing con­tra­dic­tory, mu­tu­ally ex­clu­sive things, or, be­ing aware of want­ing things to be a cer­tain way while si­mul­ta­neously be­ing aware of your in­abil­ity to work to­ward mak­ing it that way. One sub­agent wants to move left, one sub­agent wants to move right, but the agent has to pick one. Suffer­ing is some­thing like the ex­pected valence of the sub­agent that is left frus­trated.

I had a no­tion here that I could stochas­ti­cally in­tro­duce a new goal that would min­i­mize to­tal suffer­ing over an agent’s life-his­tory. I tried this, and the most sta­ble solu­tion turned out to be thus: in­tro­duce an over­whelm­ingly aver­sive goal that causes the agent to run far away from all of its other goals scream­ing. Flee­ing in per­pet­ual ter­ror, it will be too far away from its at­trac­tor-goals to feel much ex­pected valence to­wards them, and thus won’t feel too much re­gret about run­ning away from them. And it is in a sense satis­fied that it is always get­ting fur­ther and fur­ther away from the ob­ject of its dread.

File this un­der “de­gen­er­ate solu­tions that an un­friendly AI would prob­a­bly come up with to im­prove your life.”

I think a more well-thought-out defi­ni­tion of suffer­ing might yield much more in­ter­est­ing solu­tions to the suffer­ing-min­i­miza­tion prob­lem. This is an­other part of the model I would like to im­prove.

12. Hap­piness and Utility

Con­sider our sim­ple agents. What makes them happy?

You could say that some­thing like satis­fac­tion arises the mo­ment they trig­ger a goal-state. But that goal ob­ject im­me­di­ately be­gins recharg­ing, be­com­ing “dis­satis­fied” again. The agent is never ac­tu­ally con­tent, un­less you set up the in­puts such that the goal valences don’t re­gen­er­ate—or if you don’t give it goals in the first place. But if you did that, the agent would just wan­der around ran­domly af­ter ac­com­plish­ing its goals. That doesn’t seem like hap­piness.

Ob­vi­ously this code doesn’t ex­pe­rience hap­piness, but when I look at the be­hav­ior of the agents un­der differ­ent as­sump­tions, the agents seem happy when they are en­gaged in ac­com­plish­ing their var­i­ous goals. They seem un­happy when I cre­ate situ­a­tions that im­pede the effi­ciency of their work. This is ob­vi­ously pure pro­jec­tion, and says more about me, the hu­man, than it says about the agent.

So maybe a more in­ter­est­ing ques­tion: What are the high-util­ity states for the agent? At any given mo­ment of time the agents cer­tainly have prefer­ence or­der­ings, but those prefer­ence or­der­ings shift quite dra­mat­i­cally based on its lo­ca­tion and the ex­act states of each of its sub­agents, speci­fi­cally their cur­rent level of satis­fac­tion. In other words, in or­der to math­e­mat­i­cally model the prefer­ence or­der­ing of the agent across all times, you must model the in­di­vi­d­ual sub­agents.

If hu­mans “ac­tu­ally” “have” “sub­agents”—what­ever those words ac­tu­ally end up mean­ing—then the “hu­man util­ity func­tion” will need to en­com­pass each and ev­ery sub­agent. Even, I think, the very stupid ones that you don’t re­flec­tively en­dorse.


I set out on this lit­tle pro­ject be­cause I wanted to prove some as­sump­tions about the “sub­agent” model of hu­man con­scious­ness. I don’t think I can ul­ti­mately say that I “proved” any­thing, and I’m not sure that one could ever “prove” any­thing about hu­man psy­chol­ogy us­ing this par­tic­u­lar method­ol­ogy.

The line of think­ing that prompted this ex­plo­ra­tion owes a lot to Kaj_So­tala, for his on­go­ing Se­quence, Scott Alexan­der’s re­flec­tions on mo­ti­va­tion, and Mark Lipp­man’s Fold­ing ma­te­rial. It’s also their fault I used the un­wieldy lan­guage “goal-like sub­agent” in­stead of just say­ing “the agent has sev­eral goals”. I think it’s much more ac­cu­rate, and use­ful, to think of the mind as be­ing com­posed of sub­agents, than to say it “has goals”. Do you “have” goals if the goals con­trol you?

This ex­er­cise has changed my in­ner model of my own mo­ti­va­tional sys­tem. If you think long enough in terms of sub­agents, some­thing even­tu­ally clicks. Your in­ner life, and your be­hav­iors, seems to make a lot more sense. Some­times you can even lev­er­age this per­spec­tive to con­struct bet­ter goals, or to un­der­stand where some goals are ac­tu­ally com­ing from.

The code linked at the top of this page will gen­er­ate all of the figures in this ar­ti­cle. It is not es­pe­cially well doc­u­mented, and bears the marks of hav­ing been pro­grammed by a feral pro­gram­mer raised in the wilds of var­i­ous aca­demic and in­dus­trial in­sti­tu­tions. Be the at as it may, the in­ter­face is not overly com­plex. Please let me know if any­one ends up play­ing with the code and get­ting any­thing in­ter­est­ing out of it.