Being a Robust Agent

Epistemic sta­tus: not adding any­thing new, but figured there should be a clearer refer­ence post for this con­cept.

There’s a con­cept which many LessWrong es­says have pointed at it, but I don’t think there’s a sin­gle post re­ally spel­ling it out. I’ve built up an un­der­stand­ing of it through con­ver­sa­tions with Zvi and Critch, and read­ing par­tic­u­lar posts by Eliezer such as Meta-Hon­esty. (Note: none of them nec­es­sar­ily en­dorse this post, it’s just my own un­der­stand­ing)

The idea is: you might want to be­come a more ro­bust agent.

By de­fault, hu­mans are a kludgy bun­dle of ad-hoc im­pulses. But we have the abil­ity to re­flect upon our de­ci­sion mak­ing, and the im­pli­ca­tions thereof, and de­rive bet­ter over­all poli­cies.

I don’t think is quite the same thing as in­stru­men­tal ra­tio­nal­ity (al­though it’s tightly en­twined). If your goals are sim­ple and well-un­der­stood, and you’re in­ter­fac­ing in a so­cial do­main with clear rules, the most in­stru­men­tally ra­tio­nal thing might be to not over­think it and fol­low com­mon wis­dom.

But it’s par­tic­u­larly im­por­tant if you want to co­or­di­nate with other agents, over the long term. Espe­cially on am­bi­tious, com­pli­cated pro­jects in novel do­mains.

Some ex­am­ples of this:

  • Be the sort of per­son that Omega (even a ver­sion of Omega who’s only 90% ac­cu­rate) can clearly tell is go­ing to one-box. Or, more re­al­is­ti­cally – be the sort of per­son who your so­cial net­work can clearly see is worth trust­ing, with sen­si­tive in­for­ma­tion, or with power.

  • Be the sort of agent who co­op­er­ates when it is ap­pro­pri­ate, defects when it is ap­pro­pri­ate, and can re­al­ize that co­op­er­at­ing-in-this-par­tic­u­lar-in­stance might look su­perfi­cially like defect­ing, but avoid fal­ling into a trap.

  • Think about the ram­ifi­ca­tions of peo­ple who think like you adopt­ing the same strat­egy. Not as a cheap rhetor­i­cal trick to get you to co­op­er­ate on ev­ery con­ceiv­able thing. Ac­tu­ally think about how many peo­ple are similar to you. Ac­tu­ally think about the trade­offs of wor­ry­ing about a given thing. (Is re­cy­cling worth it? Is clean­ing up af­ter your­self at a group house? Is helping a per­son worth it? The an­swer ac­tu­ally de­pends, don’t pre­tend oth­er­wise).

  • If there isn’t enough in­cen­tive for oth­ers to co­op­er­ate with you, you may need to build a new co­or­di­na­tion mechanism so that there is enough in­cen­tive. Com­plain­ing or get­ting an­gry about it might be a good enough in­cen­tive but of­ten doesn’t work and/​or isn’t quite in­cen­tiviz­ing the thing you meant. (Be con­scious of the op­por­tu­nity costs of build­ing this co­or­di­na­tion mechanism in­stead of other ones. Mind­share is only so big)

  • Be the sort of agent who, if some AI en­g­ineers were white­board­ing out the agent’s de­ci­sion mak­ing, they were see that the agent makes ro­bustly good choices, such that those en­g­ineers would choose to im­ple­ment that agent as soft­ware and run it.

  • Be cog­nizant of or­der-of-mag­ni­tude. Pri­ori­tize (both for things you want for your­self, and for large scale pro­jects shoot­ing for high im­pact).

  • Do all of this re­al­is­ti­cally given your bounded cog­ni­tion. Don’t stress about im­ple­ment­ing a game the­o­ret­i­cally perfect strat­egy, but do be cog­nizant how much com­put­ing power you ac­tu­ally have (and pe­ri­od­i­cally re­flect on whether your cached strate­gies can be re-eval­u­ated given new in­for­ma­tion or more time to think). If you’re be­ing simu­lated on a white­board right now, have at least a vague, cred­ibly no­tion of how you’d think bet­ter if given more re­sources.

  • Do all of this re­al­is­ti­cally given the bounded con­di­tion of *oth­ers*. If you have a com­plex strat­egy that in­volves re­ward­ing or pun­ish­ing oth­ers in highly nu­anced ways.… and they can’t figure out what your strat­egy is, you may in­stead just be adding ran­dom noise in­stead of a clear co­or­di­na­tion pro­to­col.

Game The­ory in the Rationalsphere

The EA and Ra­tion­al­ity wor­lds in­clude lots of peo­ple with am­bi­tious, com­plex goals. They have a bunch of com­mon in­ter­ests and prob­a­bly should be co­or­di­nat­ing on a bunch of stuff. But:

  • They vary in how much they’ve thought about their goals.

  • They vary in what their goals are.

  • They vary in where their cir­cles of con­cern are drawn.

  • They vary in how hard (and how skil­lfully) they’re try­ing to be be game the­o­ret­i­cally sound agents, rather than just fol­low­ing lo­cal in­cen­tives.

  • They dis­agree on facts and strate­gies.

Be­ing a ro­bust agent means tak­ing that into ac­count, and ex­e­cut­ing strate­gies that work in a messy, mixed en­vi­ron­ment with con­fused al­lies, ac­tive ad­ver­saries, and some­times peo­ple who are a lit­tle bit of both. (Although this in­cludes cre­at­ing cred­ible in­cen­tives and pun­ish­ments to de­ter ad­ver­saries from both­er­ing, and en­courag­ing al­lies to be­come less con­fused).

I’m still mul­ling over ex­actly how to trans­late any of this into ac­tion­able ad­vice (for my­self, let alone oth­ers). But all the other posts I wanted to write felt like they’d be eas­ier if I could refer­ence this con­cept in an off-the-cuff fash­ion with­out hav­ing to ex­plain it in de­tail.