Being a Robust Agent (v2)

Se­cond ver­sion, up­dated for the 2018 Re­view. See change notes.

There’s a con­cept which many LessWrong es­says have pointed at it (in­deed, I think the en­tire se­quences are ex­plor­ing). But I don’t think there’s a sin­gle post re­ally spel­ling it out ex­plic­itly:

You might want to be­come a more ro­bust, co­her­ent agent.

By de­fault, hu­mans are a kludgy bun­dle of im­pulses. But we have the abil­ity to re­flect upon our de­ci­sion mak­ing, and the im­pli­ca­tions thereof, and de­rive bet­ter over­all poli­cies.

Some peo­ple find this nat­u­rally mo­ti­vat­ing – there’s some­thing aes­thet­i­cally ap­peal­ing about be­ing a co­her­ent agent. But if it’s not nat­u­rally ap­peal­ing, the rea­son I think it’s worth con­sid­er­ing is ro­bust­ness – be­ing able to suc­ceed at novel challenges in com­plex do­mains.

This is re­lated to be­ing in­stru­men­tally ra­tio­nal, but I don’t think they’re iden­ti­cal. If your goals are sim­ple and well-un­der­stood, and you’re in­ter­fac­ing in a so­cial do­main with clear rules, and/​or you’re op­er­at­ing in do­mains that the an­ces­tral en­vi­ron­ment would have rea­son­ably pre­pared you for… the most in­stru­men­tally ra­tio­nal thing might be to just fol­low your in­stincts or com­mon folk-wis­dom.

But in­stinct and com­mon wis­dom of­ten aren’t enough, such as when...

  • You ex­pect your en­vi­ron­ment to change, and de­fault-strate­gies to stop work­ing.

  • You are at­tempt­ing com­pli­cated plans for which there is no com­mon wis­dom, or where you will run into many edge-cases.

  • You need to co­or­di­nate with other agents in ways that don’t have ex­ist­ing, re­li­able co­or­di­na­tion mechanisms.

  • You ex­pect in­stincts or com­mon wis­dom to be wrong in par­tic­u­lar ways.

  • You are try­ing to out­perform com­mon wis­dom. (i.e. you’re a max­i­mizer in­stead of a satis­ficer, or are in com­pe­ti­tion with other peo­ple fol­low­ing com­mon wis­dom)

In those cases, you may need to de­velop strate­gies from the ground up. Your ini­tial at­tempts may ac­tu­ally be worse than the com­mon wis­dom. But in the longterm, if you can ac­quire gears-level un­der­stand­ing of your­self, the world and other agents, you might even­tu­ally out­perform the de­fault strate­gies.

Ele­ments of Ro­bust Agency

I think of Ro­bust Agency as hav­ing a few com­po­nents. This is not ex­haus­tive, but an illus­tra­tive overview:

  • De­liber­ate Agency

  • Gears-level-un­der­stand­ing of yourself

  • Co­her­ence and Consistency

  • Game The­o­retic Soundness

De­liber­ate Agency

First, you need to de­cide to be any kind of de­liber­ate agent at all. Don’t just go along with what­ever kludge of be­hav­iors that evolu­tion and your so­cial en­vi­ron­ment cob­bled to­gether. In­stead, make con­scious choices about your goals and de­ci­sion pro­ce­dures that you re­flec­tively en­dorse,

Gears Level Un­der­stand­ing of Yourself

In or­der to re­flec­tively en­dorse your goals and de­ci­sions, it helps to un­der­stand your goals and de­ci­sions, as well as in­ter­me­di­ate parts of your­self. This re­quires many sub­skills, such as the abil­ity to in­tro­spect, or to make changes to how your de­ci­sion mak­ing works.

(Mean­while, it also helps to un­der­stand how your de­ci­sions in­ter­face with the rest of the world, and the peo­ple you in­ter­act with. Gears level un­der­stand­ing is gen­er­ally use­ful. Scien­tific and math­e­mat­i­cal liter­acy helps you val­i­date your un­der­stand­ing of the world)

Co­her­ence and Consistency

If you want to lose weight and also eat a lot of ice cream, that’s a valid set of hu­man de­sires. But, well, it might just be im­pos­si­ble.

If you want to make long term plans that re­quire com­mit­ment but also want the free­dom to aban­don those plans when­ever, you may have a hard time. Peo­ple you made plans with might get an­noyed.

You can make de­liber­ate choices about how to re­solve in­con­sis­ten­cies in your prefer­ences. Maybe you de­cide “ac­tu­ally, los­ing weight isn’t that im­por­tant to me”, or maybe you de­cide that you want to keep eat­ing all your fa­vorite foods but also cut back on over­all calorie con­sump­tion.

The “com­mit­ment vs free­dom” ex­am­ple gets at a deeper is­sue – each of those opens up a set of broader strate­gies, some of which are mu­tu­ally ex­clu­sive. How you re­solve the trade­off will shape what fu­ture strate­gies are available to you.

There are benefits to re­li­ably be­ing able to make trades with your fu­ture-self, and with other agents. This is eas­ier if your prefer­ences aren’t con­tra­dic­tory, and eas­ier if your prefer­ences are ei­ther con­sis­tent over time, or at least pre­dictable over time.

Game The­o­retic Soundness

There are other agents out there. Some of them have goals or­thog­o­nal to yours. Some have com­mon in­ter­ests with you, and you may want to co­or­di­nate with them. Others may be ac­tively harm­ing you and you need to stop them.

They may vary in…

  • What their goals are.

  • What their be­liefs and strate­gies are.

  • How much they’ve thought about their goals.

  • Where they draw their cir­cles of con­cern.

  • How hard (and how skil­lfully) they’re try­ing to be game the­o­ret­i­cally sound agents, rather than just fol­low­ing lo­cal in­cen­tives.

Be­ing a ro­bust agent means tak­ing that into ac­count. You must find strate­gies that work in a messy, mixed en­vi­ron­ment with con­fused al­lies, ac­tive ad­ver­saries, and some­times peo­ple who are a lit­tle bit of both. (This in­cludes cre­at­ing cred­ible in­cen­tives and pun­ish­ments to de­ter ad­ver­saries from both­er­ing, and mo­ti­vat­ing al­lies to be­come less con­fused).

Re­lated to this is leg­i­bil­ity. Your gears-level-model-of-your­self helps you im­prove your own de­ci­sion mak­ing. But it also lets you clearly ex­pose your poli­cies to other peo­ple. This can help with trust and co­or­di­na­tion. If you have a clear de­ci­sion-mak­ing pro­ce­dure that makes sense, other agents can val­i­date it, and then you can tackle more in­ter­est­ing pro­jects to­gether.


Here’s a smat­ter­ing of things I’ve found helpful to think about through this lens:

  • Be the sort of per­son that Omega can clearly tell is go­ing to one-box – even a ver­sion of Omega who’s only 90% ac­cu­rate. Or, less ex­ot­i­cally: Be the sort of per­son who your so­cial net­work can clearly see is worth trust­ing, with sen­si­tive in­for­ma­tion, or with power. De­serve Trust.

  • Be the sort of agent who co­op­er­ates when it is ap­pro­pri­ate, defects when it is ap­pro­pri­ate, and can re­al­ize that co­op­er­at­ing-in-this-par­tic­u­lar-in­stance might look su­perfi­cially like defect­ing, but avoid fal­ling into the trap.

  • Think about the ram­ifi­ca­tions of peo­ple who think like you adopt­ing the same strat­egy. Not as a cheap rhetor­i­cal trick to get you to co­op­er­ate on ev­ery con­ceiv­able thing. Ac­tu­ally think about how many peo­ple are similar to you. Ac­tu­ally think about the trade­offs of wor­ry­ing about a given thing. (Is re­cy­cling worth it? Is clean­ing up af­ter your­self at a group house? Is helping a per­son worth it? The an­swer ac­tu­ally de­pends, don’t pre­tend oth­er­wise).

  • If there isn’t enough in­cen­tive for oth­ers to co­op­er­ate with you, you may need to build a new co­or­di­na­tion mechanism so that there is enough in­cen­tive. Com­plain­ing or get­ting an­gry about it might be a good enough in­cen­tive but of­ten doesn’t work and/​or isn’t quite in­cen­tiviz­ing the thing you meant. (Be con­scious of the op­por­tu­nity costs of build­ing this co­or­di­na­tion mechanism in­stead of other ones. Be con­scious of try­ing and failing to build a co­or­di­na­tion mechanism. Mind­share is only so big)

  • Be the sort of agent who, if some AI en­g­ineers were white­board­ing out the agent’s de­ci­sion mak­ing, they would see that the agent makes ro­bustly good choices, such that those en­g­ineers would choose to im­ple­ment that agent as soft­ware and run it.

  • Be cog­nizant of or­der-of-mag­ni­tude. Pri­ori­tize (both for things you want for your­self, and for large scale pro­jects shoot­ing for high im­pact).

  • Do all of this re­al­is­ti­cally given your bounded cog­ni­tion. Don’t stress about im­ple­ment­ing a game the­o­ret­i­cally perfect strat­egy, but do be cog­nizant how much com­put­ing power you ac­tu­ally have (and pe­ri­od­i­cally re­flect on whether your cached strate­gies can be re-eval­u­ated given new in­for­ma­tion or more time to think). If you’re be­ing simu­lated on a white­board right now, have at least a vague, cred­ible no­tion of how you’d think bet­ter if given more re­sources.

  • Do all of this re­al­is­ti­cally given the bounded con­di­tion of *oth­ers*. If you have a com­plex strat­egy that in­volves re­ward­ing or pun­ish­ing oth­ers in highly nu­anced ways.… and they can’t figure out what your strat­egy is, you may in­stead just be adding ran­dom noise in­stead of a clear co­or­di­na­tion pro­to­col.

Why is this im­por­tant?

If you are a max­i­mizer, try­ing to do some­thing hard, it’s hope­fully a bit ob­vi­ous why this is im­por­tant. It’s hard enough to do hard things with­out hav­ing in­co­her­ent ex­ploitable poli­cies and wasted mo­tion chas­ing in­con­sis­tent goals.

If you’re a satis­ficer, and you’re ba­si­cally liv­ing your life pretty chill and not stress­ing too much about it, it’s less ob­vi­ous that be­com­ing a ro­bust, co­her­ent agent is use­ful. But I think you should at least con­sider it, be­cause...

The world is unpredictable

The world is chang­ing rapidly, due to cul­tural clashes as well as new tech­nol­ogy. Com­mon wis­dom can’t han­dle the 20th cen­tury, let alone the 21st, let alone a sin­gu­lar­ity.

I feel com­fortable mak­ing the claim: Your en­vi­ron­ment is al­most cer­tainly un­pre­dictable enough that you will benefit from a co­her­ent ap­proach to solv­ing novel prob­lems. Un­der­stand­ing your goals and your strat­egy are vi­tal.

There are two main rea­sons I can see to not pri­ori­tize the co­her­ent agent strat­egy:

1. There may be higher near-term pri­ori­ties.

You may want to build a safety net, to give your­self enough slack to freely ex­per­i­ment. It may make sense to first do all the ob­vi­ous things to get a job, have enough money, and so­cial sup­port. (That is, in­deed, what I did)

I’m not kid­ding when I say that build­ing your de­ci­sion­mak­ing from the ground up can leave you worse off in the short term. The valley of bad ra­tio­nal­ity be real, yo. See this post for some ex­am­ples of things to watch out for.

Be­com­ing a co­her­ent agent is use­ful, but if you don’t have a gen­eral safety net, I’d pri­ori­tize that first.

2. Self-re­flec­tion and self-mod­ifi­ca­tion is hard.

It re­quires a cer­tain amount of men­tal horse­power, and some per­son­al­ity traits that not ev­ery­one has, in­clud­ing:

  • So­cial re­silience and open­ness-to-ex­pe­rience (nec­es­sary to try non­stan­dard strate­gies).

  • Some­thing like ‘sta­bil­ity’ or ‘com­mon sense’ (I’ve seen some peo­ple try to re­build their de­ci­sion the­ory from scratch and end up hurt­ing them­selves).

  • In gen­eral, the abil­ity to think on pur­pose, and do things on pur­pose.

If you’re the sort of per­son who ends up read­ing this post, I think you are prob­a­bly the sort of per­son who would prob­a­bly benefit (some­day, from a po­si­tion of safety/​slack) from at­tempt­ing to be­come more co­her­ent, ro­bust and agen­tic.

I’ve spent the past few years hang­ing around peo­ple who more agen­tic than me. It took a long while to re­ally ab­sorb their wor­ld­view. I hope this post gives oth­ers a clearer idea of what this path might look like, so they can con­sider it for them­selves.

Game The­ory in the Rationalsphere

That said, the rea­son I was mo­ti­vated to write this wasn’t to help in­di­vi­d­u­als. It was to help with group co­or­di­na­tion.

The EA, Ra­tion­al­ity and X-Risk ecosys­tems in­clude lots of peo­ple with am­bi­tious, com­plex goals. They have many com­mon in­ter­ests and should prob­a­bly be co­or­di­nat­ing on a bunch of stuff. But they dis­agree on many facts, and strate­gies. They vary in how hard they’ve tried to be­come game-the­o­ret­i­cally-sound agents.

My origi­nal mo­ti­va­tion for writ­ing this post was that I kept see­ing (what seemed to me) to be strate­gic mis­takes in co­or­di­na­tion. It seemed to me that peo­ple were act­ing as if the so­cial land­scape was more uniform, and ex­pect­ing peo­ple to be on the same “meta-page” of how to re­solve co­or­di­na­tion failure.

But then I re­al­ized that I’d been im­plic­itly as­sum­ing some­thing like “Hey, we’re all try­ing to be ro­bust agents, right? At least kinda? Even if we have differ­ent goals and be­liefs and strate­gies?”

And that wasn’t ob­vi­ously true in the first place.

I think it’s much eas­ier to co­or­di­nate with peo­ple if you are able to model each other. If peo­ple have com­mon knowl­edge of a shared meta-strate­gic-frame­work, it’s eas­ier to dis­cuss strat­egy and ne­go­ti­ate. If mul­ti­ple peo­ple are try­ing to make their de­ci­sion-mak­ing ro­bust in this way, that hope­fully can con­strain their ex­pec­ta­tions about when and how to trust each other.

And if you aren’t shar­ing a meta-strate­gic-frame­work, that’s im­por­tant to know!

So the most im­por­tant point of this post is to lay out the Ro­bust Agent paradigm ex­plic­itly, with a clear term I could quickly re­fer to in fu­ture dis­cus­sions, to check “is this some­thing we’re on the same page about, or not?” be­fore con­tin­u­ing on to dis­cuss more com­pli­cated ideas.