Hypothesis about how social stuff works and arises


(I can’t be both­ered to write a real Se­ri­ous Post, so I’m just go­ing to write this like a tum­blr post. y’all are try­hards with writ­ing and it’s booooor­ing, and also I have a lot of tan­gen­tially re­lated stuff to say. Pls cri­tique based on con­tent. If some­thing is un­clear, quote it and ask for clar­ifi­ca­tion)

Alright so, this is in­tended to be an ex­plicit de­scrip­tion that, hope­fully, could be turned into an ac­tual pro­gram, that would gen­er­ate the same low-level be­hav­ior as the way so­cial stuff arises from brains. Any di­ver­gence is a mis­take, and should be called out and cor­rected. it is not in­tended to be a fake frame­work. it’s ei­ther ac­tu­ally a de­scrip­tion of parts of the causal graph that are above a thresh­old level of im­pact, or it’s wrong. It’s hope­fully also a good frame­work. I’m pretty sure it’s wrong in im­por­tant ways, I’d like to hear what peo­ple sug­gest to im­prove it.

Recom­mended knowl­edge: vague un­der­stand­ing of what’s known about how the cor­tex sheet im­ple­ments fast in­fer­ence/​how “sys­tem 1” works, how hu­man re­ward works, etc, and/​or how ANNs work, how re­in­force­ment learn­ing works, etc.

The hope is that the com­pu­ta­tional model would gen­er­ate so­cial stuff we ac­tu­ally see, as high-prob­a­bil­ity spe­cial cases—in semi-tech­ni­cal terms you can ig­nore if you want, I’m hope­ful it’s a good causal/​gen­er­a­tive model, aka that it al­lows com­press­ing com­mon so­cial pat­terns with at least some­what ac­cu­rate causal graphs.

The thing

So we’re mak­ing an ex­e­cutable model of part of the brain, so I’m go­ing to write it as a se­ries of changes I’m go­ing to make. (I’m un­com­fortable with the struc­tured-ness of this, if any­one has any ideas for how to gen­er­al­ize it, that would be helpful.)

  1. To start our brain thingy off, add di­rect prefer­ences: ex­pe­riences our new brain wants to have. Make nega­tive things much worse, maybe around 5x, than good things.

    • From the in­side, this is an ex­pe­rience that in-the-mo­ment is en­joy­able/​satis­fy­ing/​juicy/​fun/​re­ward­ing/​at­trac­tive to you/​thrilling/​etc etc. Ba­sic stuff like drink­ing wa­ter, hav­ing snug­gles, be­ing ac­cepted, etc—prefer­ences that are na­ture and not nur­ture.

    • From the out­side, this is some­thing like the ex­pe­rience pro­duc­ing dopamine/​sero­tonin/​en­dor­phin/​oxy­tocin/​etc in, like, a young child or some­thing—ie, it’s na­tively re­ward­ing.

    • In the im­ple­mentable form of this model, our re­in­force­ment learner needs a state-re­ward func­tion.

    • So­cial sort of ex­ists here, but only in the form that if an agent can give some­thing you want, such as snug­gles, then you want that in­ter­ac­tion.

  2. Then, make the di­rect prefer­ences up­date by pul­ling the re­wards back through time.

    • From the in­side, this is the ex­pe­rience of things that lead to re­ward­ing things be­com­ing re­ward­ing them­selves—op­er­ant con­di­tion­ing and prefer­ences that come from nur­ture, eg com­plex fla­vor prefer­ences, room lay­out prefer­ences, prefer­ences for sta­bil­ity, prefer­ences for hy­giene be­ing easy, prefer­ences for sta­bil­ity, etc.

    • From the out­side, this is how dopamine re­lease and such hap­pens when a stim­u­lus is pre­sented that in­di­cates an in­crease in fu­ture reward

    • In the im­ple­mentable form of this model, this is any tem­po­ral differ­ence learn­ing tech­nique, such as q learning

    • So­cial ex­ists more here, in that our agent learns which agents re­li­ably pro­duce ex­pe­riences are level-1 preferred vs dis­preferred. If there’s a level-1 bor­ing/​drag­ging/​painful/​etc thing an­other agent does, it might re­sult in an up­date to­wards lower prob­a­bil­ity of good in­ter­ac­tions with that agent in that con­text. If there’s a level-1 fun/​good/​satis­fy­ing/​etc thing an­other agent does, it might re­sult in an up­date to­wards that agent be­ing good to in­ter­act with in that con­text and maybe in oth­ers.

  3. Then, mod­ify prefer­ences to deal with one-on-one in­ter­ac­tions with other agents:

    • Add track­ing of re­tri­bu­tion for other agents

      • From the in­side, this is feel­ing that you are your own per­son, get­ting an­gry if some­one does some­thing you don’t like, and be­com­ing less an­gry if you feel that they’re ac­tu­ally sorry.

      • From the out­side, this is peo­ple be­ing quick to anger and not think­ing things through be­fore get­ting an­gry about Bad Things. some­thing about SNS as well. I’m less fa­mil­iar with neu­ral im­ple­men­ta­tion of anger.

      • To im­ple­ment: Track re­tri­bu­tion-wor­thi­ness of the other agent. In­crease it if the other agent does some­thing you con­sider re­tri­bu­tion-wor­thy. Ini­tial­ize what’s re­tri­bu­tion-wor­thy to be “any­thing that hurts me”. Ini­tial­ize re­tri­bu­tion-wor­thi­ness of other agents to be zero. De­crease re­tri­bu­tion-wor­thi­ness once re­tri­bu­tion has been en­acted and ac­cepted as it­self not re­tri­bu­tion-wor­thy by the other agent.

    • Track de­serv­ing­ness/​car­ing-for other agents. Keep de­creas­ing an agents’ de­serv­ing­ness open as an op­tion for how to en­act re­tri­bu­tion.

      • From the in­side, this is the feel­ing that you want good for other peo­ple/​urge to be fair. It is not the same thing as em­pa­thy.

      • From the out­side, this is peo­ple nat­u­rally hav­ing moral sy­sems.

      • To im­ple­ment, have a world model that al­lows in­fer­ring other agents’s lo­ca­tions and prefer­ences, and mix their prefer­ences with yours a lit­tle, or some­thing. cor­rect im­ple­men­ta­tion is safe ai

    • Track phys­i­cal power-over-the-world of you vs other agents

      • From the in­side, this is the feel­ing that some­one else is more pow­er­ful or that you are more pow­er­ful. (fixme: Also some­thing about the im­pro thing goes here? how to in­te­grate?)

      • From the out­side, this is an­i­mals’ hard­coded track­ing of threat/​power sig­nal­ing—I’d ex­pect to find it at least in other mammals

      • To im­ple­ment, hand-train a pat­tern matcher on [Threat­en­ing vs Non­threat­en­ing] data, and provide this as a fea­ture to re­in­force­ment learn­ing; also in­crease de­serv­ing­ness/​de­crease re­tri­bu­tion­wor­thi­ness for agents that have high power, be­cause they are able to force this, so treat it as an acausal trade

  4. Then, track other agent’s be­liefs to iter­ate this over a so­cial graph

    • Track other agent’s coal­i­tion-build­ing power, up­date the power-over-the-world dom­i­nance based on an agent’s abil­ity to build coal­i­tions and har­ness other agent’s power.

      • From the in­side, this is the feel­ing that some­one else has a lot of friends/​is pop­u­lar, or that you have a lot of friends/​are popular

    • Track other agents’ ver­bal trust­wor­thi­ness, up­date your mod­els on level 2 di­rectly from trusted agents’ state­ments of fact

    • Track other agents’ re­tri­bu­tion lists to form con­sen­sus on what is re­tri­bu­tion-wor­thy; up­date what you treat as re­tri­bu­tion-wor­thy off of what other agents will pun­ish you for not punishing

    • Track other agents’ re­tri­bu­tion sta­tus and de­serv­ing­ness among other agents, in case of co­or­di­nated pun­ish­ment.

    • Pre­dict agents’ Re­ward­ing­ness, Retri­bu­tion-wor­thi­ness, De­serv­ing­ness, and Power based on any proxy sig­nals you can get—try to up­date as fast as pos­si­ble.

    • Im­ple­men­ta­tion: I think all you need to do is add a world model ca­pa­ble of rol­ling-in mod­el­ing other agents mod­el­ing other agents etc as feel­ings, and then all of level 4 should nat­u­rally fall out of track­ing stuff from ear­lier lev­els, but I’m not sure. For what I mean by rol­ling-in, see Un­rol­ling so­cial metacognition

Things that seem like they’re miss­ing to me

  • Greg pointed out that cur­rent ar­tifi­cial RL (ie, step 1) is miss­ing some­thing sim­ple and im­por­tant about the way re­ward works in the brain, but nei­ther of us are quite sure what ex­actly it is.

  • Greg also pointed out that the way I’m think­ing about power here doesn’t prop­erly take into ac­count the sec­ond to sec­ond im­pro thing

  • Greg thought there were in­ter­est­ing bits about how peo­ple do em­pa­thy that dis­agree re­ally hard with the way I thought level 3 works

  • Lex had a bunch of in­ter­est­ing cri­tiques I didn’t re­ally un­der­stand well enough to use. I thiiink I might have in­te­grated them at this point? not sure.

  • A bunch of peo­ple in­clud­ing me hate any­thing that has lev­els for be­ing prob­a­bly more com­pli­cated in terms of be­ing or­ga­nized struc­turally and sim­pler in terms of amount of de­tail than re­al­ity ac­tu­ally has. But I still feel like the lev­els thing is ac­tu­ally a pretty damn good rep­re­sen­ta­tion. Sugges­tions wel­come, cal­louts are not

  • This ex­pla­na­tion sucks and peo­ple prob­a­bly won’t get use­ful in­tu­itions out of this the way I have from think­ing about it a lot

misc in­ter­est­ing consequences

  • level 4 makes each of the other lev­els into par­tially-grounded key­ne­sian beauty con­tests—a thing from eco­nomics that was in­tended to model the stock mar­ket—which I think is where a lot of “sta­tus sig­nal­ing” stuff comes from. But that doesn’t mean there isn’t a real beauty con­test un­der­neath.

  • level 2 means it’s not merely a sin­gle “emo­tional bank ac­count” de­cid­ing whether peo­ple en­joy you—it’s a ques­tion of whether they pre­dict you’ll be fun to be around, which they can keep do­ing even if you make a large mis­take once.

  • level 3 De­serv­ing­ness is refer­ring to how when peo­ple say “I like you but I don’t want to in­ter­act with you”, there is a mean­ingful pre­dic­tion about their fu­ture be­hav­ior be­ing pos­i­tive to­wards you that they’re mak­ing—they just won’t nec­es­sar­ily want to like, hang out

Ex­am­ples of things to an­a­lyze would be wel­come, to ex­er­cise the model, whether the ex­am­ples fit in it or not; I’ll share some more at some point, I have a bunch of notes to share.

No nominations.
No reviews.