Integrity and accountability are core parts of rationality

Epistemic Sta­tus: Point­ing at early stage con­cepts, but with high con­fi­dence that some­thing real is here. Hope­fully not the fi­nal ver­sion of this post.

When I started study­ing ra­tio­nal­ity and philos­o­phy, I had the per­spec­tive that peo­ple who were in po­si­tions of power and in­fluence should pri­mar­ily fo­cus on how to make good de­ci­sions in gen­eral and that we should gen­er­ally give power to peo­ple who have demon­strated a good track record of gen­eral ra­tio­nal­ity. I also thought of power as this mostly un­con­strained re­source, similar to hav­ing money in your bank ac­count, and that we should make sure to pri­mar­ily al­lo­cate power to the peo­ple who are good at think­ing and mak­ing de­ci­sions.

That pic­ture has changed a lot over the years. While I think there is still a lot of value in the idea of “philoso­pher kings”, I’ve made a va­ri­ety of up­dates that sig­nifi­cantly changed my re­la­tion­ship to al­lo­cat­ing power in this way:

  • I have come to be­lieve that peo­ple’s abil­ity to come to cor­rect opinions about im­por­tant ques­tions is in large part a re­sult of whether their so­cial and mon­e­tary in­cen­tives re­ward them when they have ac­cu­rate mod­els in a spe­cific do­main. This means a per­son can have ex­tremely good opinions in one do­main of re­al­ity, be­cause they are sub­ject to good in­cen­tives, while hav­ing highly in­ac­cu­rate mod­els in a large va­ri­ety of other do­mains in which their in­cen­tives are not well op­ti­mized.

  • Peo­ple’s ra­tio­nal­ity is much more defined by their abil­ity to ma­neu­ver them­selves into en­vi­ron­ments in which their ex­ter­nal in­cen­tives al­ign with their goals, than by their abil­ity to have cor­rect opinions while be­ing sub­ject to in­cen­tives they don’t en­dorse. This is a tractable in­ter­ven­tion and so the best peo­ple will be able to have vastly more ac­cu­rate be­liefs than the av­er­age per­son, but it means that “hav­ing ac­cu­rate be­liefs in one do­main” doesn’t straight­for­wardly gen­er­al­ize to “will have ac­cu­rate be­liefs in other do­mains”.

    One is strongly pre­dic­tive of the other, and that’s in part due to gen­eral think­ing skills and broad cog­ni­tive abil­ity. But an­other ma­jor piece of the puz­zle is the per­son’s abil­ity to build and seek out en­vi­ron­ments with good in­cen­tive struc­tures.

  • Every­one is highly ir­ra­tional in their be­liefs about at least some as­pects of re­al­ity, and po­si­tions of power in par­tic­u­lar tend to en­courage strong in­cen­tives that don’t tend to be op­ti­mally al­igned with the truth. This means that highly com­pe­tent peo­ple in po­si­tions of power of­ten have less ac­cu­rate be­liefs than com­pe­tent peo­ple who are not in po­si­tions of power.

  • The de­sign of sys­tems that hold peo­ple who have power and in­fluence ac­countable in a way that al­igns their in­ter­ests with both form­ing ac­cu­rate be­liefs and the in­ter­ests of hu­man­ity at large is a re­ally im­por­tant prob­lem, and is a ma­jor de­ter­mi­nant of the over­all qual­ity of the de­ci­sion-mak­ing abil­ity of a com­mu­nity. Gen­eral ra­tio­nal­ity train­ing helps, but for col­lec­tive de­ci­sion mak­ing the cre­ation of ac­countabil­ity sys­tems, the track­ing of out­come met­rics and the de­sign of in­cen­tives is at least as big of a fac­tor as the de­gree to which the in­di­vi­d­ual mem­bers of the com­mu­nity are able to come to ac­cu­rate be­liefs on their own.

A lot of these up­dates have also shaped my think­ing while work­ing at CEA, LessWrong and the LTF-Fund over the past 4 years. I’ve been in var­i­ous po­si­tions of power, and have in­ter­acted with many peo­ple who had lots of power over the EA and Ra­tion­al­ity com­mu­ni­ties, and I’ve be­come a lot more con­vinced that there is a lot of low-hang­ing fruit and im­por­tant ex­per­i­men­ta­tion to be done to en­sure bet­ter lev­els of ac­countabil­ity and in­cen­tive-de­sign for the in­sti­tu­tions that guide our com­mu­nity.

I also gen­er­ally have broadly liber­tar­ian in­tu­itions, and a lot of my ideas about how to build func­tional or­ga­ni­za­tions are based on a more start-up like ap­proach that is fa­vored here in Sili­con Valley. Ini­tially these in­tu­itions seemed at con­flict with the in­tu­itions for more em­pha­sis on ac­countabil­ity struc­tures, with bro­ken le­gal sys­tems, ad-hoc leg­is­la­tion, dys­func­tional boards and dys­func­tional in­sti­tu­tions all com­ing to mind im­me­di­ately as ac­countabil­ity-sys­tems run wild. I’ve since then rec­on­ciled my thoughts on these top­ics a good bit.


Some­what sur­pris­ingly, “in­tegrity” has not been much dis­cussed as a con­cept han­dle on LessWrong. But I’ve found it to be a pretty valuable virtue to med­i­tate and re­flect on.

I think of in­tegrity as a more ad­vanced form of hon­esty – when I say “in­tegrity” I mean some­thing like “act­ing in ac­cor­dance with your stated be­liefs.” Where hon­esty is the com­mit­ment to not speak di­rect false­hoods, in­tegrity is the com­mit­ment to speak truths that ac­tu­ally ring true to your­self, not ones that are just ab­stractly defen­si­ble to other peo­ple. It is also a com­mit­ment to act on the truths that you do be­lieve, and to com­mu­ni­cate to oth­ers what your true be­liefs are.

In­tegrity can be a dou­ble-edged sword. While it is good to judge peo­ple by the stan­dards they ex­pressed, it is also a sure­fire way to make peo­ple overly hes­i­tant to up­date. If you get pun­ished ev­ery time you change your mind be­cause your new ac­tions are now in­con­gru­ent with the prin­ci­ples you ex­plained to oth­ers be­fore you changed your mind, then you are likely to stick with your prin­ci­ples for far longer than you would oth­er­wise, even when ev­i­dence against your po­si­tion is mount­ing.

The great benefit that I ex­pe­rienced from think­ing of in­tegrity as a virtue, is that it en­courages me to build ac­cu­rate mod­els of my own mind and mo­ti­va­tions. I can only act in line with eth­i­cal prin­ci­ples that are ac­tu­ally re­lated to the real mo­ti­va­tors of my ac­tions. If I pre­tend to hold eth­i­cal prin­ci­ples that do not cor­re­spond to my mo­ti­va­tors, then sooner or later my ac­tions will di­verge from my prin­ci­ples. I’ve come to think of a key part of in­tegrity be­ing the art of mak­ing ac­cu­rate pre­dic­tions about my own ac­tions and com­mu­ni­cat­ing those as clearly as pos­si­ble.

There are two nat­u­ral ways to en­sure that your stated prin­ci­ples are in line with your ac­tions. You ei­ther ad­just your stated prin­ci­ples un­til they match up with your ac­tions, or you ad­just your be­hav­ior to be in line with your stated prin­ci­ples. Both of those can back­fire, and both of those can have sig­nifi­cant pos­i­tive effects.

Who Should You Be Ac­countable To?

In the con­text of in­cen­tive de­sign, I find think­ing about in­tegrity valuable be­cause it feels to me like the nat­u­ral com­ple­ment to ac­countabil­ity. The pur­pose of ac­countabil­ity is to en­sure that you do what you say you are go­ing to do, and in­tegrity is the cor­re­spond­ing virtue of hold­ing up well un­der high lev­els of ac­countabil­ity.

High­light­ing ac­countabil­ity as a vari­able also high­lights one of the biggest er­ror modes of ac­countabil­ity and in­tegrity – choos­ing too broad of an au­di­ence to hold your­self ac­countable to.

There is trade­off be­tween the size of the group that you are be­ing held ac­countable by, and the com­plex­ity of the eth­i­cal prin­ci­ples you can act un­der. Too large of an au­di­ence, and you will be held ac­countable by the low­est com­mon de­nom­i­na­tor of your val­ues, which will rarely al­ign well with what you ac­tu­ally think is moral (if you’ve done any kind of real re­flec­tion on moral prin­ci­ples).

Too small or too memet­i­cally close of an au­di­ence, and you risk not enough peo­ple pay­ing at­ten­tion to what you do, to ac­tu­ally help you no­tice in­con­sis­ten­cies in your stated be­liefs and ac­tions. And, the smaller the group that is hold­ing you ac­countable is, the smaller your in­ner cir­cle of trust, which re­duces the amount of to­tal re­sources that can be co­or­di­nated un­der your shared prin­ci­ples.

I think a ma­jor mis­take that even many well-in­ten­tioned or­ga­ni­za­tions make is to try to be held ac­countable by some vague con­cep­tion of “the pub­lic”. As they make pub­lic state­ments, some­one in the pub­lic will mi­s­un­der­stand them, caus­ing a spiral of less com­mu­ni­ca­tion, re­sult­ing in more mi­s­un­der­stand­ings, re­sult­ing in even less com­mu­ni­ca­tion, cul­mi­nat­ing into an or­ga­ni­za­tion that is com­pletely opaque about any of its ac­tions and in­ten­tions, with the only com­mu­ni­ca­tion be­ing filtered by a PR de­part­ment that has lit­tle in­ter­est in the ob­servers ac­quiring any be­liefs that re­sem­ble re­al­ity.

I think a gen­er­ally bet­ter setup is to choose a much smaller group of peo­ple that you trust to eval­u­ate your ac­tions very closely, and ideally do so in a way that is it­self trans­par­ent to a broader au­di­ence. Com­mon ver­sions of this are au­di­tors, as well as non­profit boards that try to en­sure the in­tegrity of an or­ga­ni­za­tion.

This is all part of a broader re­flec­tion on try­ing to cre­ate good in­cen­tives for my­self and the LessWrong team. I will try to fol­low this up with a post that more con­cretely sum­ma­rizes my thoughts on how all of this ap­plies to LessWrong con­cretely.

In sum­mary:

  • One lens to view in­tegrity through is as an ad­vanced form of hon­esty – “act­ing in ac­cor­dance with your stated be­liefs.”

    • To im­prove in­tegrity, you can ei­ther try to bring your ac­tions in line with your stated be­liefs, or your stated be­liefs in line with your ac­tions, or re­work­ing both at the same time. Th­ese op­tions all have failure modes, but po­ten­tial benefits.

  • Peo­ple with power some­times have in­cen­tives that sys­tem­at­i­cally warp their abil­ity to form ac­cu­rate be­liefs, and (cor­re­spond­ingly) to act with in­tegrity.

  • An im­por­tant tool for main­tain­ing in­tegrity (in gen­eral, and in par­tic­u­lar as you gain power) is to care­fully think about what so­cial en­vi­ron­ment and in­cen­tive struc­tures you want for your­self.

  • Choose care­fully who, and how many peo­ple, you are ac­countable to:

    • Too many peo­ple, and you are limited in the com­plex­ity of the be­liefs and ac­tions that you can jus­tify.

    • Too few peo­ple, too similar to you, and you won’t have enough op­por­tu­ni­ties for peo­ple to no­tice and point out what you’re do­ing wrong. You may also not end up with a strong enough coal­i­tion al­igned with your prin­ci­ples to ac­com­plish your goals.

[This post was origi­nally posted on my short­form feed]