Hy­po­thesis about how so­cial stuff works and arises


(I can’t be bothered to write a real Ser­i­ous Post, so I’m just go­ing to write this like a tumblr post. y’all are try­hards with writ­ing and it’s booooor­ing, and also I have a lot of tan­gen­tially re­lated stuff to say. Pls cri­tique based on con­tent. If some­thing is un­clear, quote it and ask for cla­ri­fic­a­tion)

Al­right so, this is in­ten­ded to be an ex­pli­cit de­scrip­tion that, hope­fully, could be turned into an ac­tual pro­gram, that would gen­er­ate the same low-level be­ha­vior as the way so­cial stuff arises from brains. Any di­ver­gence is a mis­take, and should be called out and cor­rec­ted. it is not in­ten­ded to be a fake frame­work. it’s either ac­tu­ally a de­scrip­tion of parts of the causal graph that are above a threshold level of im­pact, or it’s wrong. It’s hope­fully also a good frame­work. I’m pretty sure it’s wrong in im­port­ant ways, I’d like to hear what people sug­gest to im­prove it.

Re­com­men­ded know­ledge: vague un­der­stand­ing of what’s known about how the cor­tex sheet im­ple­ments fast in­fer­ence/​how “sys­tem 1” works, how hu­man re­ward works, etc, and/​or how ANNs work, how re­in­force­ment learn­ing works, etc.

The hope is that the com­pu­ta­tional model would gen­er­ate so­cial stuff we ac­tu­ally see, as high-prob­ab­il­ity spe­cial cases—in semi-tech­nical terms you can ig­nore if you want, I’m hope­ful it’s a good causal/​gen­er­at­ive model, aka that it al­lows com­press­ing com­mon so­cial pat­terns with at least some­what ac­cur­ate causal graphs.

The thing

So we’re mak­ing an ex­ecut­able model of part of the brain, so I’m go­ing to write it as a series of changes I’m go­ing to make. (I’m un­com­fort­able with the struc­tured-ness of this, if any­one has any ideas for how to gen­er­al­ize it, that would be help­ful.)

  1. To start our brain thingy off, add dir­ect pref­er­ences: ex­per­i­ences our new brain wants to have. Make neg­at­ive things much worse, maybe around 5x, than good things.

    • From the in­side, this is an ex­per­i­ence that in-the-mo­ment is en­joy­able/​sat­is­fy­ing/​juicy/​fun/​re­ward­ing/​at­tract­ive to you/​thrill­ing/​etc etc. Basic stuff like drink­ing wa­ter, hav­ing snuggles, be­ing ac­cep­ted, etc—pref­er­ences that are nature and not nur­ture.

    • From the out­side, this is some­thing like the ex­per­i­ence pro­du­cing dopam­ine/​sero­tonin/​en­dorphin/​oxy­to­cin/​etc in, like, a young child or some­thing—ie, it’s nat­ively re­ward­ing.

    • In the im­ple­ment­able form of this model, our re­in­force­ment learner needs a state-re­ward func­tion.

    • So­cial sort of ex­ists here, but only in the form that if an agent can give some­thing you want, such as snuggles, then you want that in­ter­ac­tion.

  2. Then, make the dir­ect pref­er­ences up­date by pulling the re­wards back through time.

    • From the in­side, this is the ex­per­i­ence of things that lead to re­ward­ing things be­com­ing re­ward­ing them­selves—op­er­ant con­di­tion­ing and pref­er­ences that come from nur­ture, eg com­plex fla­vor pref­er­ences, room lay­out pref­er­ences, pref­er­ences for sta­bil­ity, pref­er­ences for hy­giene be­ing easy, pref­er­ences for sta­bil­ity, etc.

    • From the out­side, this is how dopam­ine re­lease and such hap­pens when a stim­u­lus is presen­ted that in­dic­ates an in­crease in fu­ture reward

    • In the im­ple­ment­able form of this model, this is any tem­poral dif­fer­ence learn­ing tech­nique, such as q learning

    • So­cial ex­ists more here, in that our agent learns which agents re­li­ably pro­duce ex­per­i­ences are level-1 pre­ferred vs dis­pre­ferred. If there’s a level-1 bor­ing/​drag­ging/​pain­ful/​etc thing an­other agent does, it might res­ult in an up­date to­wards lower prob­ab­il­ity of good in­ter­ac­tions with that agent in that con­text. If there’s a level-1 fun/​good/​sat­is­fy­ing/​etc thing an­other agent does, it might res­ult in an up­date to­wards that agent be­ing good to in­ter­act with in that con­text and maybe in oth­ers.

  3. Then, modify pref­er­ences to deal with one-on-one in­ter­ac­tions with other agents:

    • Add track­ing of re­tri­bu­tion for other agents

      • From the in­side, this is feel­ing that you are your own per­son, get­ting angry if someone does some­thing you don’t like, and be­com­ing less angry if you feel that they’re ac­tu­ally sorry.

      • From the out­side, this is people be­ing quick to an­ger and not think­ing things through be­fore get­ting angry about Bad Th­ings. some­thing about SNS as well. I’m less fa­mil­iar with neural im­ple­ment­a­tion of an­ger.

      • To im­ple­ment: Track re­tri­bu­tion-wor­thi­ness of the other agent. In­crease it if the other agent does some­thing you con­sider re­tri­bu­tion-worthy. Ini­tial­ize what’s re­tri­bu­tion-worthy to be “any­thing that hurts me”. Ini­tial­ize re­tri­bu­tion-wor­thi­ness of other agents to be zero. Decrease re­tri­bu­tion-wor­thi­ness once re­tri­bu­tion has been en­acted and ac­cep­ted as it­self not re­tri­bu­tion-worthy by the other agent.

    • Track de­serving­ness/​caring-for other agents. Keep de­creas­ing an agents’ de­serving­ness open as an op­tion for how to en­act re­tri­bu­tion.

      • From the in­side, this is the feel­ing that you want good for other people/​urge to be fair. It is not the same thing as em­pathy.

      • From the out­side, this is people nat­ur­ally hav­ing moral sysems.

      • To im­ple­ment, have a world model that al­lows in­fer­ring other agents’s loc­a­tions and pref­er­ences, and mix their pref­er­ences with yours a little, or some­thing. cor­rect im­ple­ment­a­tion is safe ai

    • Track phys­ical power-over-the-world of you vs other agents

      • From the in­side, this is the feel­ing that someone else is more power­ful or that you are more power­ful. (fixme: Also some­thing about the im­pro thing goes here? how to in­teg­rate?)

      • From the out­side, this is an­im­als’ hard­coded track­ing of threat/​power sig­nal­ing—I’d ex­pect to find it at least in other mammals

      • To im­ple­ment, hand-train a pat­tern matcher on [Threat­en­ing vs Non­threat­en­ing] data, and provide this as a fea­ture to re­in­force­ment learn­ing; also in­crease de­serving­ness/​de­crease re­tri­bu­tion­wor­thi­ness for agents that have high power, be­cause they are able to force this, so treat it as an acausal trade

  4. Then, track other agent’s be­liefs to it­er­ate this over a so­cial graph

    • Track other agent’s co­ali­tion-build­ing power, up­date the power-over-the-world dom­in­ance based on an agent’s abil­ity to build co­ali­tions and har­ness other agent’s power.

      • From the in­side, this is the feel­ing that someone else has a lot of friends/​is pop­u­lar, or that you have a lot of friends/​are popular

    • Track other agents’ verbal trust­wor­thi­ness, up­date your mod­els on level 2 dir­ectly from trus­ted agents’ state­ments of fact

    • Track other agents’ re­tri­bu­tion lists to form con­sensus on what is re­tri­bu­tion-worthy; up­date what you treat as re­tri­bu­tion-worthy off of what other agents will pun­ish you for not punishing

    • Track other agents’ re­tri­bu­tion status and de­serving­ness among other agents, in case of co­ordin­ated pun­ish­ment.

    • Pre­dict agents’ Re­ward­ing­ness, Retri­bu­tion-wor­thi­ness, Deserving­ness, and Power based on any proxy sig­nals you can get—try to up­date as fast as pos­sible.

    • Im­ple­ment­a­tion: I think all you need to do is add a world model cap­able of rolling-in mod­el­ing other agents mod­el­ing other agents etc as feel­ings, and then all of level 4 should nat­ur­ally fall out of track­ing stuff from earlier levels, but I’m not sure. For what I mean by rolling-in, see Un­rolling so­cial metacognition

Th­ings that seem like they’re miss­ing to me

  • Greg poin­ted out that cur­rent ar­ti­fi­cial RL (ie, step 1) is miss­ing some­thing simple and im­port­ant about the way re­ward works in the brain, but neither of us are quite sure what ex­actly it is.

  • Greg also poin­ted out that the way I’m think­ing about power here doesn’t prop­erly take into ac­count the second to second im­pro thing

  • Greg thought there were in­ter­est­ing bits about how people do em­pathy that dis­agree really hard with the way I thought level 3 works

  • Lex had a bunch of in­ter­est­ing cri­tiques I didn’t really un­der­stand well enough to use. I thii­ink I might have in­teg­rated them at this point? not sure.

  • A bunch of people in­clud­ing me hate any­thing that has levels for be­ing prob­ably more com­plic­ated in terms of be­ing or­gan­ized struc­tur­ally and sim­pler in terms of amount of de­tail than real­ity ac­tu­ally has. But I still feel like the levels thing is ac­tu­ally a pretty damn good rep­res­ent­a­tion. Sugges­tions wel­come, cal­louts are not

  • This ex­plan­a­tion sucks and people prob­ably won’t get use­ful in­tu­itions out of this the way I have from think­ing about it a lot

misc in­ter­est­ing consequences

  • level 4 makes each of the other levels into par­tially-groun­ded keyne­sian beauty con­tests—a thing from eco­nom­ics that was in­ten­ded to model the stock mar­ket—which I think is where a lot of “status sig­nal­ing” stuff comes from. But that doesn’t mean there isn’t a real beauty con­test un­der­neath.

  • level 2 means it’s not merely a single “emo­tional bank ac­count” de­cid­ing whether people en­joy you—it’s a ques­tion of whether they pre­dict you’ll be fun to be around, which they can keep do­ing even if you make a large mis­take once.

  • level 3 Deserving­ness is re­fer­ring to how when people say “I like you but I don’t want to in­ter­act with you”, there is a mean­ing­ful pre­dic­tion about their fu­ture be­ha­vior be­ing pos­it­ive to­wards you that they’re mak­ing—they just won’t ne­ces­sar­ily want to like, hang out

Examples of things to ana­lyze would be wel­come, to ex­er­cise the model, whether the ex­amples fit in it or not; I’ll share some more at some point, I have a bunch of notes to share.