Mottes and Baileys in AI discourse

This post kinda necessarily needs to touch multiple political topics at once. Please, everyone, be careful. If it looks like you haven’t read the LessWrong Political Prerequisites, I’m more likely than usual to delete your comments.


I think some people are (rightly) worried about a few flavors of Motte and Baily-ing with the IABIED discourse, and more recently, with the Superintelligence Statement.

With If Anyone Builds It:

“Sure, the motte is ‘If any company or group, anywhere on the planet, builds an artificial superintelligence using anything remotely like current techniques, based on anything remotely like the present understanding of AI, then everyone, everywhere on Earth, will die.’”. But I feel like I’m being set up for a bailies like:

“and… Eliezer’s frame on how to think about the problem is exactly right” [1]

Or, longer term, not necessarily from the MIRI folk:

“Let’s build a giant regulatory apparatus that not only stifles technological progress but also is too opaque and bureaucratic to even really solve the problems it’s purportedly supposed to solve.”

I started writing this post during the IABIED discourse, and didn’t get around to publishing it before the discourse mostly moved on and it felt slightly dated. But, today, I read Dean W. Ball’s tweet about the Superintelligence Statement:

right, but since you and I both know that this statement is a prelude to “and we should have global governance institutions hold the exclusive monopoly over the technology,” and definitely there is not consensus about that, it is “counterproductive,” as I put it, to create a false appearance of consensus toward what is in fact a radical and dangerous policy objective

These seem like totally reasonable concerns to me. Seems good to talk about.

Problem: Multi-Stage Motte and Bailey

Hedge drift and advanced motte-and-bailey is a pretty good reference for some of the concerns:

Motte and bailey is a technique by which one protects an interesting but hard-to-defend view by making it similar to a less interesting but more defensible position. Whenever the more interesting position—the bailey—is attacked—one retreats to the more defensible one—the motte -, but when the attackers are gone, one expands again to the bailey.

In that case, one and the same person switches between two interpretations of the original claim. Here, I rather want to focus on situations where different people make different interpretations of the original claim. The originator of the claim adds a number of caveats and hedges to their claim, which makes it more defensible, but less striking and sometimes also less interesting.* When others refer to the same claim, the caveats and hedges gradually disappear, however, making it more and more bailey-like.

A salient example of this is that scientific claims (particularly in messy fields like psychology and economics) often come with a number of caveats and hedges, which tend to get lost when re-told. This is especially so when media writes about these claims, but even other scientists often fail to properly transmit all the hedges and caveats that come with them.

Since this happens over and over again, people probably do expect their hedges to drift to some extent. Indeed, it would not surprise me if some people actually want hedge drift to occur. Such a strategy effectively amounts to a more effective, because less observable, version of the motte-and-bailey-strategy. Rather than switching back and forth between the motte and the bailey—something which is at least moderately observable, and also usually relies on some amount of vagueness, which is undesirable—you let others spread the bailey version of your claim, whilst you sit safe in the motte. This way, you get what you want—the spread of the bailey version—in a much safer way.

Even when people don’t use this strategy intentionally, you could argue that they should expect hedge drift, and that omitting to take action against it is, if not outright intellectually dishonest, then at least approaching that. This argument would rest on the consequentialist notion that if you have strong reasons to believe that some negative event will occur, and you could prevent it from happening by fairly simple means, then you have an obligation to do so. I certainly do think that scientists should do more to prevent their views from being garbled via hedge drift.

I think, on a good day, MIRI et al are pretty careful with their phrasings. But, not every day is a good day. Folk get rushed or triggered.

Also, there is a whole swath of doomer-ish people who aren’t being really remotely careful most of the time. And, on top of that, there is some kind of emergent toxoplasmic egregore, built of the ecosystem including thoughtful doomers, thoughtful optimistics, unthoughtful doomers, unthoughtful optimists, and twitter people hanging out nearby who don’t even really have a particular position on the topic.

I don’t actually have a specific ask for anyone. But, I know I myself am not sufficiently careful much of the time that I’m arguing on the internet, and I endorse people reminding me of this when it seems like I’m contributing to the sort of bad dynamics in this post.

Some background examples

“Privilege” and “Intellectual Freedom.”

Two more examples:

Privilege

In the early 00s, people would talk about “privilege”, noting that society is set up in ways that happen to benefit some people more than others. Some classes of people are benefited systematically, or harmed systematically.

It’s useful to notice a) individual people/​situations where someone is being harmed in a way that you have a hard time noticing or empathizing with, and b) note that there is second order effect of “society constantly denying that your problems are real problems” that sucks in a particular way that’s even less obvious if you haven’t experienced it.

That’s the motte of privilege. The bailey, which at least a significant chunk of people seem to explicitly, reflectively endorse (let alone implicitly, accidentally), is “and the privileged people should feel guilty and take a bunch of costly actions and maybe get fired if they say the wrong thing.”

Intellectual Freedom

Sort on the flipside: as social justice gained a lot of power in universities, a number of conservative-leaning intellectuals became frustrated and argued a bunch for free speech, that you should be able to talk about racial differences in IQ or sex differences in typical aptitudes.

This is partly because we just want to object level model the world and understand how biology and sociology etc work, and partly because once you make one topic taboo. But also, intellectual taboos are sort of contagious – it becomes harder to talk about IQ at all, if you risk saying the wrong specific-taboo word incidentally. And there is general scope-creep of what is taboo’d.

That’s the motte. The bailey, which at least a significant chunk of people seem to explicitly, reflectively endorse is, “and, black people /​ women should be second class citizens in some ways.”

There isn’t an easy answer

I’ve watched the liberal and conservative memeplexes evolve over the past two decades. I have been pretty disappointed in people I would have hoped were being principled, who seemed to turn out to mostly just want to change the rules to benefit them and exploit the rules once society turned in their favor.

I think privilege is a pretty real, useful concept, in my personal relationships and in my community organizing. I think it’s an important, true fact that trying to make some intellectual ideas taboo has bad consequences.

But it seems pretty difficult, at humanity’s current collective wisdom level, to take those concepts at face value without them frequently spinning into broader political agendas. And, whether you think any of the agendas or good or bad, the main point here is that it’s a fact-of-the-matter that they are not currently existing only in isolation.

What do we do about that? Ugh, I don’t know. It’s particularly hard on the broader society level where everything is very porous and it’s nigh impossible and so many people’s epistemics suck.

On LessWrong, we can at least say “Look, you need to have internalized the Political Prerequisites sequences. You are expected to make attempts to decouple, and to track Policy Debates Should Not Appear One-Sided, and Avoid Unnecessarily Political Examples, and such.” (And, try to hold each other accountable when we’re fucking it up).

But this doesn’t quite leave us with a “what necessarily to do, though, when the topic is necessarily politically charged, with consequences outside of LessWrong discussion fora?”

EA, “Out to Get You”, and “Giving Your All”

Quite awhile ago, Zvi wrote Out to Get You, noting systems and memeplexes that don’t have your best interests and want to exploit you as much as they can.

An important example is politics. Political causes want every spare minute and dollar. They want to choose your friends, words and thoughts. If given power, they seize the resources of state and nation for their purposes. Then they take those purposes further. One cannot simply give any political movement what it wants. That way lies ruin and madness.

Yes, that means your cause, too.

In the comments, I brought up a pattern I knew Zvi was worried about, which is the way Effective Altruism as a memeplex seems to want to encourage people to unhealthily “give their all.”

I noted: the problem is, it’s true that the world is on fire and needs all the help it can get. And, there is a way to engage with that, where you accept that truth into yourself, and also track your various other goals and needs and values. Aim to be a coherent person who enthusiastically helps where you can, but, also genuinely pursuing other interests so you don’t throw your mind away, and have maintain slack even if something feels like “an emergency” and just genuinely hold onto the other things

I think this is what much EA leadership explicitly believe. And, I think it’s reasonable and basically correct.

Nonetheless, there are problems:

Getting to the point where you’re on board with Point A [i.e. sane, healthy integration of your goals, including “the world is on fire”], getting to the health version of Point A often requires going through some awkward and unhealthy stages where you haven’t fully integrated everything. Which may mean you are believing some false things and perhaps doing harm to yourself.

Even if you read a series of lengthy posts before taking any actions, even if the Giving What We Can Pledge began with “we really think you should read some detailed blogposts about the psychology of this before you commit” (this may be a good idea), reading the blogposts wouldn’t actually be enough to really understand everything.

So, people who are still in the process of grappling with everything end up on EA forum and EA Facebook and EA Tumblr saying things like “if you live off more than $20k a year that’s basically murder”. (And also, you have people on Dank EA Memes saying all of this ironically except maybe not except maybe it’s fine who knows?)

And stopping all this from happening would be pretty time consuming.

2) The world is in fact on fire, and people disagree on what the priorities should be on what are acceptable things to do in order for that to be less the case. And while the Official Party Line is something like Point A, there’s still a fair number of prominent people hanging around who do earnestly lean towards “it’s okay to make costs hidden, it’s okay to not be as dedicated to truth as Zvi or Ben Hoffman or Sarah Constantin would like, because it is Worth It.”

And, if EA is growing, then you should expect at even given point, most of the people around are people who are in the awkward transition phase (or, not even realizing they should be going through an awkward transition phase), so they are most of the people you hear.

And that means they’re a pretty dominant force in the resulting ecosystem, even if the leadership was 100% perfectly nuanced.

This sort of concern is part of why I think LessWrong should specifically not try to be evangelical about rationality. This is a community with nuanced group epistemics, and flooding it with people who don’t get that would ruin it. I think EA sort of needs more intrinsically to be at least somewhat evangelical, but, they should heavily prioritize the fidelity of the message and not trying to grow faster than they can stably do.

IABIED doesn’t intrinsically need to be evangelical – it’s hypothetically enough for the right people to read the book and be persuaded to take the arguments seriously. But, people, including/​especially politicians, have trouble believing things that society thinks is crazy.

So, it’s the sort of memeplex that does actively want to be evangelical enough to get to a point where it’s a common, reasonable sounding position that “If any company or group, anywhere on the planet, builds an artificial superintelligence using anything remotely like current techniques, based on anything remotely like the present understanding of AI, then everyone, everywhere on Earth, will die.”

Okay, back to “The Problem” as I conceive it

From my perspective, here’s what things look like:

  • The world is in peril.

  • It is not very close to being out of peril. (Even when I grant very optimistic assumptions about how nice/​smooth takeoff goes, it still seems very bad for humanity[2])

  • It may be possible to scramble to leverage weakish AGI to get us out of peril, but, the people with the compute do not seem like they are asking the right questions, and it looks like many of the people who like this plan keep sliding off some key concepts.[3]

  • The space of political/​coordination actions currently in the Overton Window are not really close to getting us out of peril. (At best, they get us to barely squeak by if we get lucky)

This all does not justify any given desperate action. But it does mean I am most excited about plans that route through “actually try to move the Overton Window in a very substantial way”, if those plans are actively good.

Within my worldview, an important aspect of the “Change Overton Window” plan, is that humanity will need to do some pretty nuanced things. It’s not enough to get humanity to do one discrete action that you burn through the epistemic commons to achieve. We need an epistemic-clarity-win that’s stable at the the level of a few dozen world/​company leaders.

I think MIRI folk agree roughly with the above. I am less sure whether the Future of Life Institute people agree with that last bit. Regardless, the standard here is very challenging to meet and even well meaning people won’t be meeting it on a bad day. And there’s all kinds of other people who are not trying to meet that standard at all who will take your ideas and run with them.

If you disagree with the premise of “we’re pretty likely to die unless the political situation changes A Lot”, well, it makes sense if you’re worried about the downside risks of the sort of thing I’m advocating for here. We might be political enemies some of the time, sorry about that. Fwiw I do appreciate the substantial downsides of many potential solutions.

I don’t have a specific solution in mind, but I think:

  • Just let the motte-baileying play out however it’d normally play out

  • Avoid doing anything anything that ever risks multi-stage motte-and-baily-ing

  • Generally be clear on what the bailey is (I think MIRI has pretty consistently done this, i.e. “yes, you should be willing to go to war over illegal datacenters.” But other groups have not always).

I think what I want is mostly “people pushing for agreement on simple statements, acknowledge this as a thing to watch out for, and put at least some effort into pumping against it.” And, people who agree with simple statements but are wary of them spiraling politically into something bad you disagree with, cut the first group a bit of slack about it. (but, like, not infinite slack)

  1. ^

    When Eliezer, Nate, or MIRI et al are at their best, I think they avoid actually doing this (largely by pretty overtly stating the bailey as also important to them), but, I don’t know that they’re always at their best, and whether they are or not, seems reasonable from the outside to be worried about it.

  2. ^

    I have a post brewing that argues this point in more detail.

  3. ^

    If you think I’m explicitly wrong about this, btw, I’m interested in hearing more about the nuts and bolts of this disagreement. Seems like one of the most important disagreements.

  4. ^

    There’s an annoying problem where even writing out my concerns sort of reifies “there is a tribal conflict going on”, instead of trying to be-the-change-I-wanna-see of not looking at it through the lens of tribal conflict. But, the tribal conflict seems real and not gonna go away and we gotta figure out a better equilibrium and I don’t know how to do it without talking about it explicitly.

    In addition to dicey move of “bring up political examples”, I also need to psychologize a bit about people on multiple sides of multiple issues. Which is all the more so because there’s not really a single person I’m psychologizing.