A high integrity/epistemics political coalition?

Raemon14 Dec 2025 22:21 UTC

148 points

I have goals that are much easier to reach with a powerful political bl. Probably a lot of other people around here share them. (Goals include “ensure no powerful dangerous AI get built”, “ensure governance of the US and world are broadly good / not decaying”, “have good civic discourse that plugs into said governance.”)

I think it’d be good if there was a powerful, high integrity political bloc with good epistemics, trying to make those things happen.

Unfortunately the naive ways of doing that would destroy the good things about the rationalist intellectual scene. This post lays out some thoughts on how to have a political bloc with good epistemics and integrity.

Recently, I gave to the Alex Bores campaign. It turned out to raise a quite serious, surprising amount of money.

I donated to Alex Bores fairly confidently. A few years ago, I donated to Carrick Flynn, feeling kinda skeezy about it. Not because there’s necessarily anything wrong with Carrick Flynn, but, because the process that promoted “donate to Carrick Flynn” to my attention was a self-referential “well, he’s an EA, so it’s good if he’s in office.” (There might have been people with more info than that, but I didn’t hear much about it).

Ultimately, I kinda agreed, but, I wouldn’t have publicly defended the choice. This was during FTX era where money was abundant and we were starting to attract grifters (i.e. hearing explicit comments like “oh man all you have to do is say you care about causes X and Y and you can get free money.”) It was not sustainable to keep donating to people “because they were EA” or “because they mouthed the words ‘AI Safety’.”

Alas, there are important political goals I want to accomplish. Political goals require getting a lot of people moving in lockstep. Rationalists hate moving in lockstep. For good reason. At the time, my solution was “donate to Carrick Flynn, but feel skeezy about it.”

One option is leave this to “The EA community” rather than trying to invoke “the rationalists.” Alas, I just… don’t really trust the EA community to do a good job here. Or, rather, them succeeding at this requires them to lean into the rationalist-y traits, which would reintroduce all the same allergies and handwringing. My political goals are nuanced. I don’t want to go the route of environmentalism that bans nuclear power and ends up making things worse.

The AI Safety Case

AI Safety isn’t the only thing you might want a powerful political bloc with good epistemics to support. Maybe people want to be ambitious and do something much more openended than that. But, this is the motivating case for why it’s in my top-5 things to maybe do, and it’s useful to dissect motivating cases.

I think many people around here agree we need to stop the development of unsafe, overwhelmingly powerful superintelligence. (We might disagree about a lot about the correct steps to achieve that).

Here are some ways to fail to do that:

you create a molochian Moral Maze that’s in charge of “regulating AI”, which isn’t even trying to do the right thing, staffed by self-serving bureaucrats that hand out favors that have nothing to do with regulating unsafe, overwhelmingly powerful superintelligence.
you create a highly trusted set of technocrats who, unfortunately, are just wrong about what types of training runs,compute controls, or other interventions will actually work, because that’s a complex question.
you create some system that does approximately the right thing on Day 1 but still needs to be making “live” choices 2 decades later and has ossified.
you never got buy-in for the thing, because you didn’t know how to compromise and build alliances.
you built alliances that accomplish some superficially similar goal that isn’t solving the right problem.

That’s rough. Wat do?

What I think Wat Do is, figure out how to build a political network that is powerful enough to have leverage, but, is still based on a solid foundation of epistemic trust.

How do that?

Well, alas I dunno. But it feels very achievable to me to do better than both “don’t play the game” or “naively play the game, short sightedly.” Here are some thoughts on that

Some reason things are hard

This is difficult for lots of reasons. Here are some easier to articulate ones:

Mutual Reputation Alliances

A lot of the world runs on implicit alliances, where people agree to recommend each other as good people, and not to say bad things about each other.

One big reason ornery rationalists are like “politics is Real Hard to do without intellectual compromise” (while other people might be like “I see why you’d be worried, but, you seem to be exaggerating the worry”), is that this is a very pernicious. It fucks with epistemics in a way that is invisible if you’re not actively tracking it, and the mutual reputation alliances don’t want you to be tracking it so it requires active effort to make it possible to track.

See: Heads I Win, Tails?—Never Heard of Her; Or, Selective Reporting and the Tragedy of the Green Rationalists

People feel an incentive to gain power generally

There are good (naive) reasons to gain power. You do need political power to get shit done. But, also, people feel an attraction to power for normal, boring, selfish reasons. It is easy to deceive yourself about your motivations here, and about what your motivations will be in the future when you’ve enmeshed yourself in a political alliance.

Lots of ways of gaining power involve Mutual Reputation Alliances, or other compromises.

(Oliver Habryka has argued to me that there are ways of gaining conditional power (as opposed to unconditional power) which involve less compromise. This post is mostly about gaining unconditional power but seemed worth flagging the difference)

Private information is very relevant

There is some public info available, but for “will this broad political project work longterm”, it’s going to depend on things like “does so-and-so keep their word?”, “will so-and-so keep keeping their word if the political situation changes, or they see an opportunity for power?”

This requires subtle details about their character, which you can only really get from people who have worked with them a bunch, who are often part of a mutual reputation alliance, won’t want their name attached to the info if you share it, and will only give you the info if you can share it in a way that won’t make it obvious that they were the one sharing it.

Powerful people can be vindictive

In addition to “embedded in a mutual reputation alliance”, powerful people can be vindictive if you try to share negative information about their character. And, since they are powerful, if they want to hurt you, they probably can.

People don’t share bad information about powerful people out of fear, not just loyalty.

(One specific case of this is “they can sue you for libel, or at least threaten to.”)

Politics is broadly adversarial

There will be rival actors who don’t want your preferred candidate to be elected or your preferred policy to be implemented. They will actively make it hard for you to do this. They may do so with underhanded tactics that are difficult to detect, just under the threshold for feeling “unreasonable” so it’s hard to call out.

It also means that sometimes you want to raise funds or maneuver in secret.

Lying and Misleadingness are contagious

Mutual reputation alliances are costly because they radiate out of the alliance. In practice, there is not a sharp divide between the politicians and the rationalists. The people rallying support and finding private information will (by default, probably) radiate some pressure to not question the narrative, and to avoid making someone regret having shared information.

Politics is the Mind Killer / Hard Mode

This is hard-mode enough when we’re just trying to be a corner of the internet talking about some stuff. It’ll matter a lot more if you are trying to achieve a political goal.

See: Politics is the Mind-Killer and Politics is hard mode

A high integrity political bloc needs to work longterm, not just once

A lot of these problems aren’t that bad if you’re doing a one-time political maneuver. You might make some enemies and risk a bit of tribal groupthink, but, eh, then you go back to doing other things and the consequences are bounded.

But, the whole point of building a Good Epistemics/Integrity political bloc is to keep persistently doing stuff. This will attract enemies, if it succeeds. It will also attract…

Grift

People will try to manipulate into giving them money. Some instances of this might be well intentioned. You need to be able to defend against it anyway.

Passwords should be costly to fake

If it’s known that there’s a High Integrity/Epistemics Political Bloc that’s on the lookout for sociopaths and subtle corruption, people will try to mouth the words that make it sound like they are avoiding sociopathy/subtle-corruption. This includes both candidates, and people running the rallying-campaigns to get candidates funded.

“I believe in AI safety” or “I care about epistemics” is an easy password to fake.

An example of a harder password to fake is “I have made many public statements about my commitments that would look bad for me if I betrayed them.”

For people running PACs or other orgs, “here are the incentives I have constructed to make it hard for myself / The Org to betray it’s principles” is even better. (i.e. OpenAI’s nonprofit governance structure did make it at least difficult and take multiple years, for the org to betray it’s principles).

Example solution: Private and/or Retrospective Watchdogs for Political Donations

A sometimes-difficulty with political fundraising is early on it’s often important to happen in a low-key way, since if rival politicians know your plan they can work against it. But,

I think part of the process should be, there are people involved in low-key-private-political-fundraising who are playing a watchdog role, helping establish mutual knowledge of things like whether a given politician...

Top Tier:

...has ever made a political costly decision to stand by a principle
...does NOT have any track record of various flavors of sociopathy
...has ever gotten a bill passed that looks like it’d actually help with x-risk or civilizational sanity or other relevant things.

Mid Tier

...has ever stated out loud “I want to pass a bill that helps with x-risk or related stuff”, that establishes a reputation you can at least call them on later.
...has a reputation for consistently saying things that make sense, and not saying things that don’t make sense.

Minimum Tier:

...in private conversations, they seem to say things that make sense, promise to work on AI risk or important related things, etc… and, ideally, this is vouched for by someone who has a track record of successfully noticing sociopaths who claimed such things, but later betrayed their principles.
…they seem generally qualified for the office they’re running for.

I didn’t trust the people advocating for Alex Bores to have noticed sociopathy. But, he did in fact pass the Raise Act. Scott Wiener tried to pass SB 1047 twice and succeeded the second time, sorta. They might still betray their principles later, but, their track record indicates they are at least willing to ever put their actions where their mouth was, and the bills looked pretty reasonable.

That seemed good enough to me to be worth $7000 (given the other analysis arguing that the money would help them win).

If I imagine a high Integrity Political Bloc, I think it probably involves some sort of evaluator watchdog who a) privately researches and circulates information about candidates during the Low Key period, and b) writes public writeups afterwards that allow for retrospective sanity checking, and noticing if the political bloc is going astray.

I’d want the watchdogs to split up observations and inferences, and split up particular observations about Cause A vs Cause B (i.e. make it easy for people who want to support AI safety but don’t care about veganism, or, vice versa, to track which candidates are good by their lights, rather than aggregating them into a general vector of Goodness).

People in charge of PACs/similar needs good judgment

The actual motivating example here was thinking about supporting PACs, as opposed to candidates.

I don’t actually understand PACs very well. But, as I understand it, they need to be deciding which candidates to support, which means you need all the same apparatus for evaluating candidates and thinking through longterm consequences.

Any broad political org needs a person in charge of it who is responsible for making sure it is high integrity. I have a particularly high bar for this.

If you want to run a PAC or org that gets money from a hypothetical High Epistemics/Integrity Political Bloc, it is not merely your job to “not lie” or “not mess up in the obvious ways.” Politics is hard mode. You need to be tracking the incentives, tracking whether your org is evolving into a moral maze, and proactively work to make sure it doesn’t get eaten by an egregore.

This requires taste, as well as effort.

Taste is hard to acquire. Often, “just try harder” won’t realistic work. If you don’t have good enough judgment, you either need to find another person to be in charge, or you might need to go try doing some projects that will enable you to learn from experience and become wiser / more cynical / etc.

Don’t share reputation / Watchdogs shouldn’t be “an org”

An earlier draft described this as “GiveWell for retroactive political action assessment”. But, the word “Givewell” implies there is an org. Orgs bundle up people’s reputation together, such that every person involved feels pressure to not risk the reputation of everyone else at the org. This has been a failure mode at OpenPhil (from what I understand).

Watchdogs will need to make some tradeoff on gaining access to private information, vs making various promises and compromises. But, they can do that individually, so the results aren’t as contagious.

Different “Watchdogs” and “Rally-ers”

I would ideally like everyone involved to have maximally good epistemics. But, in order for this to succeed, you need some people who are really good at rallying large numbers of people to do a thing (i.e donate to candidates, vote). Rallying is a different skill from maintaining-good-epistemics-while-evaluating. It’s hard to be good at both. It’s hard because a) it’s just generally harder to have two skills than one, and b) “rallying” just does often require a mindset that is more Mindkiller-y.

So, I would like at least some people who are spec’d into “watchdog-ing”/”evaluation”, who are not also trying to rally people.

I want rally people to be more careful on the margin. I think it is possible to skill up at inspiring conviction/action without having distorted beliefs. But, I think the project can work if the rally-ers aren’t maximally good at that.

Watchdog-ing the ecosystem, not just candidates

One way for this to fail is for individual candidates to turn out to be grifters who extract money, or sociopaths who end up net-negative.

Another way for this to fail is for the system to become subtly corrupted over time, making individual little compromises that don’t seem that bad but add up to “now, this is just a regular ol’ political bloc, with the word ‘epistemics/integrity’ taped to the front door.”

There needs to be watchdogs who are modeling the whole ecosystem, and speaking out if it is sliding towards failure.

Donors/voters have a responsibility not to get exploited

It’s not enough for watchdogs to periodically say “hey, this candidate seems sus” or “we seem to be sliding towards worse epistemics collectively.” The people voting with their ballots or wallets need to actually care. This means a critical mass of them need to actually care about the system not sliding towards corruption.

Prediction markets for integrity violation

This could be an entirely separate idea for “watchdog evaluators”, but it dovetails nicely. For candidates that a powerful high-integrity political bloc are trying to help, it probably makes sense to have public prediction markets about whether they will keep their word about various promises.

If individual watchdogs gain a track record for successfully noticing “so and so is going to betray their principles” and “so and so probably won’t betray their principles”, those people can also then maybe be trusted more to represent private information (“I talked to Candidate Alice, and I really do get a sense of them knowing what they’re talking about and committing to Cause A”).

The main problem with doing that publicly is that powerful people might be vindictive about it. I’m most worried about people being vindictive when they kind grew up with the rationalsphere, so having rationalists criticize them or estimate them as low integrity, feels personal, rather than just cost-of-doing-business as a politician.

I do think the norm and vibe should be “this is a cost of doing business. If you want money/support from the high integrity political engine, you should expect people to be evaluating you, this is nothing personal, the standards are very exacting and you may not meet them.”

Handling getting sued for libel

A problem I’m not 100% sure how to handle, is getting sued for evaluating people/orgs as sociopathic.

I’m not sure what the legal standing is, if a prediction market reads:

“Within 5 years, I will judge that OpenAI’s nonprofit board no longer has teeth”

“Within 5 years, I will think [Candidate X] betrayed a campaign promise.”

or:

“Within 5 years, CEO Charlie will have violated one of these principles they established.”

A serious political engine could have money to defend against lawsuits, but, also, the more money you have, the more it’s worth suing you. (I think at the very least having someone who specializes in handling all the hassle of getting sued would be worth it).

My hope is that, unlike previous instances of people trying to claim an individual did bad things, this project is in some sense “big enough to be clearly worth protecting” (whereas a random person in a vague community scene being maybe a bad actor doesn’t have anyone incentivized to make it their job to defend)

LessWrong is for evaluation, and (at best) a very specific kind of rallying

Sometimes people get annoyed that LessWrong isn’t letting them do a particular kind of rallying, or saying something with one voice. They read Why Our Kind Can’t Cooperate and are like “okay, so, can we have a culture where people publicly support things and there isn’t this intense allergic criticism?”.

I think maybe there should be another forum or tool for doing that sort of thing. But, it’s definitely not LessWrong’s job. LessWrong definitely should not be synonymous with a political agenda.

I think posts like these are fine and good:

I feel wary of posts like this:

I think the difference is:

Posts that argue the object level of ‘this candidate or project will have good/bad consequences’ are fine.

Some things are less fine include: trying to shame people without making arguments, or laying plausibly deniable traps that make it hard to argue back, or that try to establish a false consensus.

Posts that argue about what is socially acceptable to think/say on LessWrong ARE fine. The difference between this and the previous one can be subtle. I still find John Wentworth’s comments from Power buys you distance from the crime pretty good:

> Who’s at fault for the subcontractor(^3)’s slave labor?
[...] My instinct says DO NOT EVER ASK THAT QUESTION, it is a WRONG QUESTION, you will be instantly mindkilled every time you ask “who should be blamed for X?”.
… on reflection, I do not want to endorse this as an all-the-time heuristic, but I do want to endorse it whenever good epistemic discussion is an objective. Asking “who should we blame?” is always engaging in a status fight. Status fights are generally mindkillers, and should be kept strictly separate from modelling and epistemics.
Now, this does not mean that we shouldn’t model status fights. Rather, it means that we should strive to avoid engaging in status fights when modelling them. Concretely: rather than ask “who should we blame?”, ask “what incentives do we create by blaming <actor>?”. This puts the question in an analytical frame, rather than a “we’re having a status fight right now” frame.

To be clear, LessWrong doesn’t prevent you from posting rallying / status-fighty / social-reality-manipulating posts. But, it is setup to discourage it on the margin, and prevent a lot of the upside from trying to do it. You won’t be on the frontpage, you won’t get curated, etc. If it seems like you’re doing it in a way that mods think is bad for the culture, we might yell at you.

(But also note, I did not run this by the rest of the Lightcone team and we have a policy of speaking for ourselves, since orgs don’t actually have “beliefs”)

Recap

Just to restate all the premises in one place:

A political bloc is a system that coordinates lots of people to produce a political outcome. (If you don’t need to coordinate lots of people, you just have a political actor, not a bloc)

It’s hard to build a high integrity/epistemics political bloc, because:

There is a pull towards mutual reputation alliances
There are incentives to gain power that distort our thinking
There are incentives towards simple grift.
You need access to private information (which often lives within the mutual reputation alliances)
Powerful people might try to punish you for exposing subtle character flaws
Enemies will be trying to sabotage you, while maintaining plausible deniability

And this all needs to keep working longterm, if you want a longterm powerful impact, so, it needs to be robust to all the adversarial failure modes.

Some potential solutions:

Have private evaluator people who check in on whether candidates seem good, and whether the whole political bloc seems sane.
Avoid sharing reputation as much as possible, so people feel more free to speak/think independently.
Maybe try prediction markets for commitment-violation.
Donors/voters will need to decide which candidates to support, and need to actually be trying to form their own judgments to avoid getting consumed by an egregore.