Epistemic Status – some mixture of:

“My best guess, based on some theory, practice and observations. But very much _not_ battle-tested”
but also, “poetry that’s designed to get an idea across that isn’t necessarily precisely accurate”, intended to get across the generators for my current worldview.
Was waiting to post this until I resolved some disagreements that seemed upstream, but I think that’ll be awhile and maybe was a bad reason to delay anyhow. idk. YOLO.

tl;dr:

People are not automatically robust agents, and neither are organizations.

An organization can become an agent (probably?) but only if it’s built right. Your default assumption should probably be that a given organization is not an agent (and therefore may not be able to credibly make certain kinds of commitments).

Your default assumption, if you’re building an organization, should probably be that it will not be an agent (and will have some pathologies common to organizations).

If you try on purpose to make it an agent, have good principles, etc…

...well, your organization probably still won’t be an agent, and some of those principles might get co-opted by adversarial processes. But I think it’s possible for an organization to at least be better at robust agency (and, also better at being “good”, or “human value aligned”, or at least “aligned with the values of the person who founded it.”)

Becoming a robustly agentic person

For a few years I’ve been crystallizing what it means to be a robust agent, by which I mean: “Reliably performing well, even if the environment around you changes. Have good policies. Have good meta policies. Be able to interface well with people who might have a wide variety of strategies, some of who might be malicious or confused.”

People are not born automatically strategic, nor are they born an “agent.”

If you want robust agency, you have to cultivate it on purpose.

I have a friend who solves a lot of problems using the multi-agent paradigm. He spends a lot of effort integrating and empowering his sub-agents. He treats them like adults, makes sure they understand each other and trust each other. He makes sure each of them have accurate beliefs, and he tries to empower each of them as much as possible so they have no need to compete.

This… doesn’t actually work for me.

I’ve tried things like internal double crux or internal family systems, and so far, it’s just produced a confused “meh.” Insofar as “sub-agents” is a workable framework, I still have a pretty adversarial relationship with myself. (When I’m having trouble sleeping or staying off facebook, instead of figuring out what needs my sub-agents have and meeting them all… I just block facebook for 16 hours a day and program my computer to turn itself off every hour of the night starting at 11pm)

I’m tempted to write off my friend’s claims as weird-posthoc-narrative. But this friend is among the more impressive people I know, and consistently has good reasons for things that initially sound weird to me. (This shouldn’t be strong evidence to you, but it’s enough evidence for me personally to take it seriously)

I once asked him “so… how do you even get your sub-agents to say anything to each other? I can’t tell if I have sub-agents or not but if I do they sure seem incoherent. Have you always had coherent sub-agents?”

And he said (paraphrased by me), something like:

“You know how when you’re a baby, you’re a flailing incoherent mess. And then you become, like, a four year old and you can sort of communicate but you can’t keep promises or figure things out very well. And then you’re a teenager and… maybe you’re a reasonable person, but maybe you’re still angry and moody and think you know everything even though you’re like fourteen-year-old and kinda insufferable?

“But… eventually you become an actual person who can make reasonable trades, and keep contracts?

“My sub-agents were like that. Initially they were incoherent like a baby. But I spent years cultivating them and teaching them and helping them grow and now they’re, like, coherent entities that have accurate beliefs and can negotiate with each other and it’s all super reasonable.”

“An important element here was giving the sub-agents jobs. I looked at what Fear was doing, and one thing seemed to be “help me notice when a bad thing was going to happen to me.” And I said “Okay, Fear. This is now your official job. I will be helping you to do this. If you are doing a good job, or seem to be making mistakes, I will be giving you feedback about that.”

This… was an interesting outlook.

The jury’s still out on whether sub-agents are a useful framework for me. But this still fit into an interesting meta-framework.

Subagents or no, people don’t stop growing as agents when they become adults – there’s more to learn. I’ve worked over the past few years to improve my ability to think, and have good policies that defend my values while interfacing better with potential allies and enemies and confused bystanders.

I still have a lot more to go.

Becoming a robust organization

People are not automatically robust agents.

Neither are organizations.

Whether or not sub-agents are a valid frame for humans (or for particular humans), they seem like a pretty valid lens to examine organizations through.

An organization is born without a brain, and without a soul, and it will not have either unless you proactively build it one. And, I suspect, you are limited in your ability to build it one by the degree of soul and brain that you have cultivated in yourself. (Where “you” is “whoever is building the organization”, which might be one founder or multiple co-founders)

Vignettes of Organizational Coherence

Epistemic Status: Somewhat poetry-esque. These vignettes from different organizations paint a picture more than they spell out an explicit argument. But I hope it helps express the overall worldview I currently hold.

Holding off on Hiring

YCombinator recommends that young startups avoid hiring people as long as possible. I think there are a number of reasons for this, but one guess is that you’re ability to grow the soul of your organization weakens dramatically as it scales. It’s much harder to communicate nuanced beliefs to many-people-at-once than a few people.

The years where your organization is small, and everyone can easily talk to everyone… those are the years when you have the chance to plant the seed of agency and the spark of goodness, to ensure your organization grows into something that is aligned with your values.

The Human Alignment Problem

Ray Dalio, of Bridgewater, has a book of Principles that he endeavors to follow, and have Bridgewater follow. I disagree (or are quite skeptical about) a lot of his implementation details. But I think the meta-principle of having principles is valuable. In particular, writing things down so that you can notice when you have violated your previously stated principles seems important.

One thing he talks a lot about is “getting in sync”, which he discusses in this blog post:

For an organization to be effective, the people who make it up must be aligned on many levels—from what their shared mission is, to how they will treat each other, to a more practical picture of who will do what when to achieve their goals. Yet alignment can never be taken for granted because people are wired so differently. We all see ourselves and the world in our own unique ways, so deciding what’s true and what to do about it takes constant work.

Alignment is especially important in an idea meritocracy, so at Bridgewater we try to attain alignment consciously, continually, and systematically. We call this process of finding alignment “getting in sync,” and there are two primary ways it can go wrong: cases resulting from simple misunderstandings and those stemming from fundamental disagreements. Getting in sync is the process of open-mindedly and assertively rectifying both types.

Many people mistakenly believe that papering over diﬀerences is the easiest way to keep the peace. They couldn’t be more wrong. By avoiding conﬂicts one avoids resolving diﬀerences. People who suppress minor conﬂicts tend to have much bigger conﬂicts later on [...]

While it is straightforward to have a meritocracy in activities in which there is clarity of relative abilities (because the results speak for themselves such as in sports, where the fastest runner wins the race), it is much harder in a creative environment (where diﬀerent points of view about what’s best have to be resolved). If they’re not, the process of sorting through disagreements and knowing who has the authority to decide quickly becomes chaotic. Sometimes people get angry or stuck; a conversation can easily wind up with two or more people spinning unproductively and unable to reach agreement on what to do.

For these reasons, speciﬁc processes and procedures must be followed. Every party to the discussion must understand who has what rights and which procedures should be followed to move toward resolution. (We’ve also developed tools for helping do this). And everyone must understand the most fundamental principle for getting in sync, which is that people must be open-minded and assertive at the same time.

The Treacherous Turn

This particular description about the treacherous turn (typically as applied to AI, but in this case using the example of a human) feels relevant:

To master lying, a child should:

1. Possess the necessary cognitive abilities to lie (for instance, by being able to say words or sentences).

2. Understand that humans can (deliberately) say falsehoods about the world or their beliefs.

3. Practice lying, allowing himself/herself to be punished if caught.

If language acquisition flourishes when children are aged 15-18 months, the proportion of them who lie (about peeking in a psychology study) goes from 30% at age 2, to 50% of three-year olds, eventually reaching 80% at eight. Most importantly, they get better as they get older, going from blatant lies to pretending to be making reasonable/honest guesses.

There is therefore a gap between the moment children could (in theory) lie (18 months) and the moment they can effectively lie and use this technique to their own advantage (8 years old). During this gap, parents can correct the kid’s moral values through education.

I’m not sure the metaphor quite holds. But it seems plausible that if you want an organization where individuals, teams and departments don’t lie (whether blatantly and maliciously, or through ‘honest goodhart-esque mistakes’, or through something like Benquo’s 4-level-simulacrum concept), you have some window in which you can try to install a robust system of honesty, honor and integrity, before the system becomes too powerful to shape.

Sometimes bureaucracy is successfully protecting a thing, and that’s good

Samo’s How to Use Bureaucracies matched my experience watching bureaucracies form. I’ve seen bureaucracies form that looked reasonably formed-on-purpose-by-a-competent-person, and I’ve seen glimpses of ones that looked sort of cobbled together like spaghetti towers.

An interesting viewpoint I’ve heard recently is “usually when people are complaining that Bureaucracies don’t have souls, I think they’re just mad that the bureaucracy didn’t give them the resources they wanted. And the bureaucracy was specifically designed to stop from people like them from exploiting it.

“Academic bureaucracies, say, have a particular goal of educating people and doing research. If you come to them with a plan that will educate people or improve research, they will usually give you want you want. If you come to them trying to get weird special exceptions or faculties for saving the world or whatever, they’ll be like ‘um, our job is not to save the world, it is to educate people and do research. If we gave resources to every person with a pet cause, we’d fall apart immediately.’”

“Likewise, if they impose a weird rule on you, it’s probably because in the past sometime fucked up in some way relating to that rule. And dealing with the fallout was really annoying, and they decided they didn’t want to have to deal with that fallout ever again. Sorry that you think you’re a good exception or the rule is stupid – part of the point of policies is to abstract away certain things so they can’t bother you and you can focus on what matters.”

I’m not sure how often this is actually true and how often it’s just a convenient story (bureaucracies do seem to be built out of spaghetti towers). But it seems plausible in at least some cases. And it seems noteworthy that “having a soul” might be compatible with “include leviathanic institutions that don’t seem to care about you as a person.”

Sabotaging the Nazis

On the flipside...

LW user Lionhearted notes in Explicit and Implicit communication that during World War II, some allies went to infiltrate the Nazis and gum up the works. They received explicit instructions like:

“(11) General Interference with Organizations and Production [...]

(1) Insist on doing everything through “channels.” Never permit short-cuts to be taken in order to expedite decisions.

(2) Make “speeches.” Talk as frequently as possible and at great length. Illustrate your “points” by long anecdotes and accounts of personal experiences. Never hesitate to make a few appropriate “patriotic” comments.

(3) When possible, refer all matters to committees, for “further study and consideration.” Attempt to make the committees as large as possible — never less than five. [...]

(5) Haggle over precise wordings of communications, minutes, resolutions.

(6) Refer back to matters decided upon at the last meeting and attempt to re-open the question of the advisability of that decision.

(7) Advocate “caution.” Be “reasonable” and urge your fellow-conferees to be “reasonable” and avoid haste which might result in embarrassments or difficulties later on.

(8) Be worried about the propriety of any decision — raise the question of whether such action as is contemplated lies within the jurisdiction of the group or whether it might conflict with the policy of some higher echelon.”

And… well, this all sure sounds like the pathologies I normally associate with bureaucracy. This sort of thing seems to happen by default, as an organization scales.

There’s also Scott’s IRB Nightmare.

Organizations have to make decisions and keep promises.

Why can’t you just have individual agents within an organization? Why does it matter that the organization-as-a-whole be an agent?

If you can’t make “real” decisions and keep commitments, you will be limited in your ability to engage in certain strategies, in some cases unable to engage in mutually beneficial trade.

Organizations control resources that are often beyond the control of a single person, and involve complicated decision making procedures. Sometimes the procedure is a legible, principled process. Sometimes a few key people in the room-where-it-happens hash things out, opaquely. Sometimes it’s a legible-but-spaghetti-tower bureaucracy.

Any of these can work. But regardless, the organization can have access to resources beyond the sum-of-the-individual people involved. But if the organization isn’t coherent, it can struggle to make credible promises that are necessary to trade. (This might work a couple times, but then trading partners may become more skeptical)

Possible failure modes:

Sometimes nobody has any power – everyone requires too many checks from too many other people and long term planning can’t happen on purpose.

Sometimes you talk to the head of the org, and maybe you even trust the head of the org, and they say the org will do a thing, but somehow the org doesn’t end up doing the thing.

Sometimes, you can talk to each individual person at the org and they all agree Decision X would be best, but they’re all afraid to speak up because there isn’t common knowledge that they agree with Decision X. Or, they do all agree and know it, but they can’t say it publicly because The Public doesn’t understand Decision X.

So Decision X doesn’t get made.

Sometimes you talk to each individual person and they each individually agree that Decision X is good, and you talk to the entire group and the entire group seems to agree that Decision X would be good, but… somehow Decision X doesn’t get done.

I think it makes sense for bureaucracies to exist sometimes, and to have the explicit purpose of preventing people from exploiting things too easily. But, it’s still useful for some part of the institution to be able to make decisions and commitments that weren’t part of explicitly-laid-out bureaucracy chain.

Porous movements aren’t and can’t be agents

I think that agency requires a membrane, something keeps particular people in and out, such that you have any deliberate culture, principles or decision making at all.

Relatedly, I think you need a membrane for Stag Hunts to work – if any rando can blunder into the formation at the last moment, there’s no way you can catch a stag.

Organizations have fairly strong membranes, and sometimes informal community institutions can as well. But this is relatively rare.

So while I’m disappointed sometimes when particular individuals and organizations don’t live up to the ideals I think they were trying for… I don’t think it makes much sense to hold most “movements” to the ideal of agency. Movements are too chaotic, too hard to police, too easy to show up in and start shouting and taking up attention.

Instead, instead, I think of movements as a place where a lot of people with similar ideals are clustered together. This makes it easier to find recruit people into organizations that do have membranes and can have principles.

Narrative control and contracts, as alternative coordination mechanisms

Another friend who ran an organization once remarked (paraphrased)

“It seemed like the organization’s main coordination mechanism was a particular narrative that people rallied around. When I was in charge, I felt like it was my job to uphold that narrative, even when the narrative got epistemically dicey. This felt really bad for my soul, and eventually I stopped being in charge.”

“I’m not sure what to do about this problem – organizations need some kind of coordination mechanism. I think a potential solution might be to make central element of your company culture ‘upholding contracts.’ Maybe you don’t all share the same vision for the company, but you can make concrete trades. Some of those trades are “I will do X and you will pay me dollars”, and some might be between employees, like “I will work enthusiastically on this aspect of the company for 2 months if you work enthusiastically on that aspect of it.”

This seems plausible to me. But importantly, I don’t think you get “uphold contracts” as a virtue for free. If you want your employees to be able to do it reliably, you need mechanisms to train and reinforce that. (I think if you recruit from some homogenous cultures it might come more automatically, but it’s not my default experience)

Integrity and Accountability

Habryka recently wrote about Integrity and Accountability, and it seemed useful to just quote the summary here:

One lens to view integrity through is as an advanced form of honesty – “acting in accordance with your stated beliefs.”

— To improve integrity, you can either try to bring your actions in line with your stated beliefs, or your stated beliefs in line with your actions, or reworking both at the same time. These options all have failure modes, but potential benefits.

— People with power sometimes have incentives that systematically warp their ability to form accurate beliefs, and (correspondingly) to act with integrity.

An important tool for maintaining integrity (in general, and in particular as you gain power) is to carefully think about what social environment and incentive structures you want for yourself.

Choose carefully who, and how many people, you are accountable to:

— Too many people, and you are limited in the complexity of the beliefs and actions that you can justify.

— Too few people, too similar to you, and you won’t have enough opportunities for people to notice and point out what you’re doing wrong. You may also not end up with a strong enough coalition aligned with your principles to accomplish your goals.

Open Problems in Robust Group Agency

Exercises for the reader, and for me:

1. How do you make sure your group has any kind of agency at all, let alone be ‘value-aligned’

2. How do you choose people to be accountable to? What if you’re trying to do something really hard, and there seem to be few or zero people who you trust enough to be accountable to?

3. It seems like the last cluster of people who tried to solve accountability created committees and boards and bureaucracies, and… I dunno, maybe that stuff works fine if you do it right. But it seems easy to become dysfunctional in particular ways. What’s up with that?

3. What “rabbit” strategies are available, within and without organizations, that are self-reinforcing in the near term, that can help build trust, accountability, and robust agency?

4. What “stag” strategies could you successfully execute on if you had a small group of people working hard together?

4b. How can you get a small group of dedicated, aligned people?

5. How can people maintain accurate beliefs in the face of groupthink?

6. How can any of this scale?

Robust Agency for People and Organizations

Becoming a robustly agentic person

Becoming a robust organization

Vignettes of Organizational Coherence

Open Problems in Robust Group Agency