An epistemic advantage of working as a moderate
[epistemic status: the points I make are IMO real and important, but there are also various counterpoints; I’m not settled on an overall opinion here, and the categories I draw are probably kind of dumb/misleading]
Many people who are concerned about existential risk from AI spend their time advocating for radical changes to how AI is handled. Most notably, they advocate for costly restrictions on how AI is developed now and in the future, e.g. the Pause AI people or the MIRI people. In contrast, I spend most of my time thinking about relatively cheap interventions that AI companies could implement to reduce risk assuming a low budget, and about how to cause AI companies to marginally increase that budget. I’ll use the words “radicals” and “moderates” to refer to these two clusters of people/strategies. In this post, I’ll discuss the effect of being a radical or a moderate on your epistemics.
I don’t necessarily disagree with radicals, and most of the disagreement is unrelated to the topic of this post; see footnote for more on this.[1]
I often hear people claim that being a radical is better for your epistemics than being a moderate: in particular, I often hear people say that moderates end up too friendly to AI companies due to working with people from AI companies. I agree, but I think that working as a moderate has a huge advantage for your epistemics.
The main advantage for epistemics of working as a moderate is that almost all of your work has an informed, intelligent, thoughtful audience. I spend lots of time talking to and aiming to persuade AI company staff who are generally very intelligent, knowledgeable about AI, and intimately familiar with the goings-on at AI companies. In contrast, as a radical, almost all of your audience—policymakers, elites, the general public—is poorly informed, only able or willing to engage shallowly, and needs to have their attention grabbed intentionally. The former situation is obviously way more conducive to maintaining good epistemics.
I think working as a moderate has a bunch of good effects on me:
I’m extremely strongly incentivized to know what’s up. If I try to bullshit about AI, the people I’m talking to will notice that I’m bullshitting and judge me harshly.
I’m strongly incentivized to make arguments that I can justify. My interlocutors know enough about what’s going on that they can poke at particular parts of my argument and judge me harshly if the argument is flimsy. And if they’re not persuaded by an argument, I can go off and try to find arguments or evidence that will persuade them.
I don’t need to spend as much time optimizing for virality: my audience is mostly already willing to hear from me.
I don’t need to engage in coalitional politics where I make common cause with activists who are allied with me for some contingent reason.
Most of the spicy things I say can be said privately to just the people who need to hear them, freeing me from thinking about the implications of random third parties reading my writing.
I genuinely expect to change my mind as a result of conversations I have about my work. The people I talk to often have something to teach me.
I am not incentivized to exude confidence or other emotional affect. I don’t have to worry that if I caveat my arguments appropriately they won’t be as persuasive.
Because I’m trying to make changes on the margin, details of the current situation are much more interesting to me. In contrast, radicals don’t really care about e.g. the different ways that corporate politics affects AI safety interventions at different AI companies.
I have specific asks and can see how people respond to them. Radicals don’t really get to see whether people take specific actions based on their advocacy. I think this leads them to have greater risk of getting bullshitted by people who claim to be aligned with them but actually aren’t. (Though the moderates have also had substantial issues with this in the past.)
Radicals often seem to think of AI companies as faceless bogeymen thoughtlessly lumbering towards the destruction of the world. In contrast, I think of AI companies as complicated machines full of intelligent people, many of whom are well-meaning, that are thoughtlessly lumbering towards the destruction of the world due to some combination of ignorance, greed, and contemptible personal ambition. I think that the frustration and anger that I feel as a result of my work is more thoroughly textured than the frustration and anger that radicals feel.
Many people I know who work on radical AI advocacy spend almost all their time thinking about what is persuasive and attention-grabbing for an uninformed audience. They don’t experience nearly as much pressure on a day-to-day basis to be well informed about AI, to understand the fine points of their arguments, or to be calibrated and careful in their statements. They update way less on the situation from their day-to-day work than I do. They spend their time as big fish in a small pond.
I think this effect is pretty big. People who work on radical policy change often seem to me to be disconnected from reality and sloppy with their thinking; to engage as soldiers for their side of an argument, enthusiastically repeating their slogans. I think it’s pretty bizarre that despite the fact that LessWrongers are usually acutely aware of the epistemic downsides of being an activist, they seem to have paid relatively little attention to this in their recent transition to activism. Given that radical activism both seems very promising and is popular among LessWrongers regardless of what I think about it, I hope we try to understand the risks (perhaps by thinking about historical analogies) of activism and think proactively and with humility about how to mitigate them.
I’ll note again that the epistemic advantages of working as a moderate aren’t in themselves strong reasons to believe that moderates are right about their overall strategy.
I will also note that I work as a moderate from outside AI companies; I believe that working inside AI companies carries substantial risks for your epistemics. But IMO the risks from working at a company are worth conceptually distinguishing from the risks resulting from working towards companies adopting marginal changes.
In this post, I mostly conflated “being a moderate” with “working with people at AI companies”. You could in principle be a moderate and work to impose extremely moderate regulations, or push for minor changes to the behavior of governments. I did this conflation mostly because I think that for small and inexpensive actions, you’re usually better off trying to make them happen by talking to companies or other actors directly (e.g. starting a non-profit to do the project) rather than trying to persuade uninformed people to make them happen. And cases where you push for minor changes to the behavior of governments have many of the advantages I described here: you’re doing work that substantially involves understanding a topic (e.g. the inner workings of the USG) that your interlocutors also understand well, and you spend a lot of your time responding to well-informed objections about the costs and benefits of some intervention.
Thanks to Daniel Filan for helpful comments.
- ^
Some of our difference in strategy is just specialization: I’m excited for many radical projects, some of my best friends work on them, and I can imagine myself working on them in the future. I work as a moderate mostly because I (and Redwood) have comparative advantage for it: I like thinking in detail about countermeasures, and engaging in detailed arguments with well-informed but skeptical audiences about threat models.
And most of the rest of the difference in strategy between me and radicals is downstream of genuine object-level disagreement about AI risks and how promising different interventions are. If you think that all the interventions I’m excited for are useless, then obviously you shouldn’t spend your time advocating for them.
What’s your version of the story for how the “moderates” at OpenPhil ended up believing stuff even others can now see to be fucking nuts in retrospect and which “extremists” called out at the time, like “bio anchoring” in 2021 putting AGI in median fucking 2050, or Carlsmith’s Multiple Stage Fallacy risk estimate of 5% that involved only an 80% chance anyone would even try to build agentic AI?
Were they no true moderates? How could anyone tell the difference in advance?
From my perspective, the story is that “moderates” are selected to believe nice-sounding moderate things, and Reality is off doing something else because it doesn’t care about fitting in the same way. People who try to think like reality are then termed “extremist”, because they don’t fit into the nice consensus of people hanging out together and being agreeable about nonsense. Others may of course end up extremists for other reasons. It’s not that everyone extreme is reality-driven, but that everyone who is getting pushed around by reality (instead of pleasant hanging-out forces like “AGI in 2050, 5% risk” as sounded very moderate to moderates before the ChatGPT Moment) ends up departing from the socially driven forces of what entitles you to sound terribly reasonable to the old AIco-OpenPhil cluster and hang out at their social gatherings without anyone feeling uncomfortable.
Anyone who loves being an extremist will of course go instantly haywire a la Yampolskiy imagining that he has proven alignment impossible via Godelian fallacy so he can say 99.9999% doom. But yielding to the psychological comfort of being a “moderate” will not get you any further in science than that.
You can be a moderate by believing only moderate things. Or you can be a moderate by adopting moderate strategies. These are not necessarily the same thing.
This piece seems to be mostly advocating for the benefits of moderate strategies.
Your reply seems to mostly be criticizing moderate beliefs.
(My political beliefs are a ridiculous assortment of things, many of them outside the Overton window. If someone tells me their political beliefs are all moderate, I suspect them of being a sheep.
But my political strategies are moderate: I have voted for various parties’ candidates at various times, depending on who seems worse lately. This seems...strategically correct to me?)
How does one end up with moderate beliefs without relying on moderate strategies? (In less pressured fields, I could imagine this happening as a matter of course, but I am surprised if someone follows a reality-hugging strategy in AI and ends up believing ‘moderate things’.)
This is false as stated. The report says:
The corresponding footnote 179 is:
Strong incentives isn’t the same as “anyone would try to build” and “agentic AI” isn’t the same as APS systems (which has a much more specific and stronger definition!).
I’d personally put more like 90% on the claim (and it might depend a lot on what you mean by strong incentives).
To be clear, I agree with the claim that Carlsmith’s report suffers from multi-stage fallacy (e.g., even without strong incentives to build APS systems, you can easily get AI takeover on my views) and is importantly wrong (and I thought so at the time), but your specific claim about the report here is incorrect.
I accept your correction and Buck’s as to these simple facts (was posting from mobile).
Are you talking in-retrospect? If you currently assign only 90% to this claim, I would be very happy to take your money (I would say a reasonable definition that I think Joe would have accepted at the time that we would be dealing with more than $1 billion in annual expenditure towards this goal).
I… actually have trouble imagining any definition that isn’t already met, as people are clearly trying to do this right now. But like, still happy to take your money if you want to bet and ask some third-party to adjudicate.
I wasn’t talking in retrospect, but I meant something might larger than $1 billion by strong incentives and I really mean very specifically APS systems at the time when they are feasible to build.
The 10% would come from other approaches/architectures ending up being surprisingly better at the point when people could build APS systems. (E.g., you don’t need your AIs to have “the models they use in making plans represent with reasonable accuracy the causal upshot of gaining and maintaining different forms of power over humans and the real-world environment”.)
In further consideration I might be more like 95%, but 90% doesn’t seem crazy to me depending on the details of the operationalization.
I would put very high probabilities on “people would pay >$50 billion for a strong APS system right now”, so we presumably agree on that.
It’s really key to my perspective here that by the time that we can build APS systems, maybe something else which narrowly doesn’t meet this definition has come around and looks more competitive. There is something messy here because maybe there are strong incentives to build APS systems eventually, but this occurs substantially after full automation of the whole economy or similar by other systems. I was trying to exclude cases where substantially after human intellectual labor is totally obsolete, APS systems are strongly incentivized as this case is pretty different. (And other factors like “maybe we’ll have radical superbabies before we can build APS systems” factor in too, though again this is very sensitive to operationalization.)
Pretty surprising that the paper doesn’t give much indication to what counts as “strong incentives” (or at least not that I could find after searching for 2 mins).
Note that this post is arguing that there are some specific epistemic advantages of working as a moderate, not that moderates are always correct or that there aren’t epistemic disadvantages to being a moderate. I don’t think “there exist moderates which seem very incorrect to me” is a valid response to the post similarly to how “there exist radicals which seem very incorrect to me” wouldn’t be a valid argument for the post.
This is independent from the point Buck notes that the label moderate as defined in the post doesn’t apply in 2020.
As a response to the literal comment at top-of-thread, this is clearly reasonable. But I think Eliezer is correctly invoking some important subtext here, which your comment doesn’t properly answer. (I think this because I often make a similar move to the one Eliezer is making, and have only understood within the past couple years what’s load-bearing about it.)
Specifically, there’s an important difference between:
“<person> was wrong about <argument/prediction/etc>, so we should update downward on deferring to their arguments/predictions/etc”, vs
“<person> was wrong about <argument/prediction/etc>, in a way which seems blindingly obvious when we actually think about it, and so is strong evidence that <person> has some systematic problem in the methods they’re using to think (as opposed to just being unlucky with this one argument/prediction/etc)”
Eliezer isn’t saying the first one, he’s saying the second one, and then following it up with a specific model of what is wrong with the thinking-methods in question. He’s invoking bio anchors and that Carlsmith report as examples of systematically terrible thinking, i.e. thinking which is in some sense “obviously” wrong when one is not trying-on-some-level to avoid real-thinking about it, and he’s specifically pointing to desire-to-appear-moderate as the likely primary factor which drove that systematically terrible thinking. He’s not invoking them merely as examples of people being wrong.
I think “this was blindingly obvious when we actually think about it”, is not socially admissible evidence, because of hindsight bias.
I thought about a lot of this stuff before 2020. For the most part, I didn’t reach definitive conclusions about a lot of it. In retrospect, a lot of the the conclusions that I did provisionally accept, I, in retrospect, think I was overconfident about, given the epistemic warrant.
Was I doing “actual thing”? No, probably not, or at least not by many relevant standards. Could I have done better? Surely, but not by recourse to magical “just think better” cognition.
The fact remains that It Was Not Obvious To Me.
Others may claim that it was obvious to them, and they might be right—maybe it was obvious to them.
If a person declared operationalized-enough-to-be-gradable prediction before the event was settled, well, then I can update that their worldview made correct predictions.
But if they additionally say that it was obvious and we all should have been able to tell, well, that they say that doesn’t add any additional evidential weight? A person saying “it was blindingly obvious” doesn’t particularly distinguish the world where it actually was obvious, and I would have been able to tell if I had done Actual Thinking, and the world where it was a confusing question about the future that was hard to call in advance, and they happened to get this one right.
If you can show me how it’s obvious, such that it does in fact become obvious to me, that’s a different story.[1] But even then, the time to show that something is obvious is before reality reveals the answer.
It’s fine, I guess, for Eliezer and John to assert that some things were actually obvious, and we all should have been able to tell.
But I (and almost everyone else who didn’t call it as obvious in advance), should pay attention to the correct prediction, and ignore the assertion that it was obvious.
And, to be clear, both Eliezer and John have put in enormous levels of effort into trying to do that kind of communication. I can’t fault you for not attempting to show what you think you know.
Feels like there’s some kind of frame-error here, like you’re complaining that the move in question isn’t using a particular interface, but the move isn’t intended to use that interface in the first place? Can’t quite put my finger on it, but I’ll try to gesture in the right direction.
Consider ye olde philosophers who liked to throw around syllogisms. You and I can look at many of those syllogisms and be like “that’s cute and clever and does not bind to reality at all, that’s not how real-thinking works”. But if we’d been around at the time, very plausibly we would not be able to recognize the failure; maybe we would not have been able to predict in advance that many of the philosophers’ clever syllogisms totally fail to bind to reality.
Nonetheless, it is still useful and instructive to look at those syllogisms and say “look, these things obviously-in-some-sense do not bind to reality, they are not real-thinking, and therefore they are strong evidence that there is something systematically wrong with the thinking-methods of those philosophers”. (Eliezer would probably reflexively follow that up with “so I should figure out what systematic thinking errors plagued those seemingly-bright philosophers, and caused them to deceive themselves with syllogisms, in order to avoid those errors myself”.)
And if there’s some modern-day philosopher standing nearby saying that in fact syllogisms totally do bind to reality… then yeah, this whole move isn’t really a response to them. That’s not really what it’s intended for. But even if one’s goal is to respond to that philosopher, it’s probably still a useful first step to figure out what systematic thinking error causes them to not notice that many of their syllogisms totally fail to bind to reality.
So I guess maybe… Eliezer’s imagined audience here is someone who has already noticed that bio anchors and the Carlsmith thing fail to bind to reality, but you’re criticizing it for not instead responding to a hypothetical audience who thinks that the reports maybe do bind to reality?
I almost added a sentence at the end of my comment to the effect of…
“Either someone did that X was blindly obvious, in which case they don’t need to be told, or it wasn’t blindingly obvious to them, and they should should pay attention to the correct prediction, and ignore the assertion that it was obvious. In either case...the statement isn’t doing anything?”
Who are statements like these for? Is it for the people who thought that things were obvious to find and identify each other?
To gesture at a concern I have (which I think is probably orthogonal to what you’re pointing at):
On a first pass, the only people who might be influenced by statements like that are being influenced epistemically illegitimately.
Like, I’m imagining a person, Bob, who heard all the arguments at the time and did not feel confident enough to make a specific prediction. But then we all get to wait a few years and see how (some of the questions, though not most of them) actually played out, and then Eliezer or whoever says “not only was I right, it was blindingly obvious that I was right, and we all should have known all along!”
This is in practice received by Bob as almost an invitation to rewrite history and hindsight bias about what happened. It’s very natural to agree with Eliezer (or whoever) that, “yeah, it was obvious all along.” [1]
And that’s really sus! Bob didn’t get new information or think of new considerations that caused the confusing question to go from confusing to obvious. He just learned the answer!
He should be reminding himself that he didn’t in fact make an advance prediction, and remembering that at the time, it seemed like a confusing hard-to-call question, and analyzing what kinds of general thinking patterns would have allowed him to correctly call this one in advance.
I think when Eliezer gets irate and at people for what he considers their cognitive distortions:
It doesn’t convince the people he’s ostensibly arguing against, because those people don’t share his premises. They often disagree with him, on the object level, about whether the specific conclusions under discussion have been falsified.
(eg Ryan saying he doesn’t think bio ancors was unreasonable, in this thread, or Paul disagreeing with Eliezer claims ~”that people like Paul are surprised by how the world actually plays out.”)
It doesn’t convince the tiny number of people who could see for themselves that those ways of thinking were blindingly obvious (and/or have a shared error pattern with Eliezer, that cause them to be making the same mistake).
(eg John Wentworth)
It does sweep up some social-ideologically doomer-y people into feeling more confidence for their doomerism and related beliefs, both by social proof (Eliezer is so confident and assertive, which makes me feel more comfortable asserting high P(doom)s), and because Eliezer’s setting a frame in which he’s right, and people doing Real Thinking(TM), can see that he’s right, and anyone who doesn’t get it is blinded by frustrating biases.
(eg “Bob”, though I’m thinking of a few specific people.)
It alienates a bunch of onlookers, both people who think that Eliezer is wrong / making a mistake, and people who are agnostic.
In all cases, this seems either unproductive or counterproductive.
Like, there’s some extra psychological omph of just how right Eliezer (or whoever) was and how wrong the other parties were. You get to be on the side of the people who were right all along, against the oppressive forces of OpenPhil’s powerful distortionary forces / the power of modest epistemology / whatever. There’s some story that the irateness invites onlookers like Bob to participate in.
Ok, I think one of the biggest disconnects here is that Eliezer is currently talking in hindsight about what we should learn from past events, and this is and should often be different from what most people could have learned at the time. Again, consider the syllogism example: just because you or I might have been fooled by it at the time does not mean we can’t learn from the obvious-in-some-sense foolishness after the fact. The relevant kind of “obviousness” needs to include obviousness in hindsight for the move Eliezer is making to work, not necessarily obviousness in advance, though it does also need to “obvious” in advance in a different sense (more on that below).
Short handle: “It seems obvious in hindsight that <X> was foolish (not merely a sensible-but-incorrect prediction from insufficient data); why wasn’t that obvious at the time, and what pattern do I need to be on the watch for to make it obvious in the future?”
Eliezer’s application of that pattern to the case at hand goes:
It seems obvious-in-some-sense in hindsight that bio anchors and the Carlsmith thing were foolish, i.e. one can read them and go “man this does seem kind of silly”.
Insofar as that wasn’t obvious at the time, it’s largely because people were selecting for moderate-sounding conclusions. (That’s not the only generalizable pattern which played a role here, but it’s an important one.)
So in the future, I should be on the lookout for the pattern of selecting for moderate-sounding conclusions.
I think an important gear here is that things can be obvious-in-hindsight, but not in advance, in a way which isn’t really a Bayesian update on new evidence and therefore doesn’t strictly follow prediction rules.
Toy example:
Someone publishes a proof of a mathematical conjecture, which enters canon as a theorem.
Some years later, another person stumbles on a counterexample.
Surprised mathematicians go back over the old proof, and indeed find a load-bearing error. Turns out the proof was wrong!
The key point here is that the error was an error of reasoning, not an error of insufficient evidence or anything like that. The error was “obvious” in some sense in advance; a mathematician who’d squinted at the right part of the proof could have spotted it. Yet in practice, it was discovered by evidence arriving, rather than by someone squinting at the proof.
Note that this toy example is exactly the sort where the right primary move to make afterwards is to say “the error is obvious in hindsight, and was obvious-in-some-sense beforehand, even if nobody noticed it. Why the failure, and how do we avoid that in the future?”.
This is very much the thing Eliezer is doing here. He’s (he claims) pointing to a failure of reasoning, not of insufficient evidence. For many people, the arrival of more recent evidence has probably made it more obvious that there was a reasoning failure, and those people are the audience who (hopefully) get value from the move Eliezer made—hopefully they will be able to spot such silly patterns better in the future.
That’s my model here as well. Pseudo-formalizing it: We’re not idealized agents, we’re bounded agents, which means we can’t actually do full Bayesian updates. We have to pick and choose what computations we run, what classes of evidence we look for and update on. In hindsight, we may discover that an incorrect prediction was caused by ours opting not to spend the resources on updating on some specific information, such that if we knew to do that, we would have reliably avoided the error even while having all the same object-level information.
In other words, it’s a Bayesian update to the distribution over Bayesian updates we should run. We discover a thing about (human) reasoning: that there’s a specific reasoning error/oversight we’re prone to, and that we have to run an update on the output of “am I making this reasoning error?” in specific situations.
This doesn’t necessarily mean that this meta-level error would have been obvious to anyone in the world at all, at the time it was made. Nowadays, we all may be committing fallacies whose very definitions require agent-foundations theory decades ahead of ours; fallacies whose definitions we wouldn’t even understand without reading a future textbook. But it does mean that specific object-level conclusions we’re reaching today would be obviously incorrect to someone who is reasoning in a more correct way.
If someone predicts in advance that something is obviously false, and then you come to believe that it’s false, then you should update not just towards thought processes which would have predicted that the thing is false, but also towards thought processes which would have predicted that the thing is obviously false. (Conversely, if they predict that it’s obviously false, and it turns out to be true, you should update more strongly against their thought processes than if they’d just predicted it was false.)
IIRC Eliezer’s objection to bioanchors can be reasonably interpreted as an advance prediction that “it’s obviously false”, though to be confident I’d need to reread his original post (which I can’t be bothered to do right now).
I think this is wrong. The scenarios where this outcome was easily predicted given the right heuristics and the scenarios where this was surprising to every side of the debate are quite different. Knowing who had predictors that worked in this scenario is useful evidence, especially when the debate was about which frames for thinking about things and selecting heuristics were useful.
Or, to put this in simpler but somewhat imprecise terms: This was not obvious to you because you were thinking about things the wrong way. You didn’t know which way to think about things at the time because you lacked information about which predicted things better. You now have evidence about which ways work better, and can copy heuristics from people who were less surprised.
The argument “there are specific epistemic advantages of working as a moderate” isn’t just a claim about categories that everyone agrees exist, it’s also a way of carving up the world. However, you can carve up the world in very misleading ways depending on how you lump different groups together. For example, if a post distinguished “people without crazy-sounding beliefs” from “people with crazy-sounding beliefs”, the latter category would lump together truth-seeking nonconformists with actual crazy people. There’s no easy way of figuring out which categories should be treated as useful vs useless but the evidence Eliezer cites does seem relevant.
On a more object level, my main critique of the post is that almost all of the bullet points are even more true of, say, working as a physicist. And so structurally speaking I don’t know how to distinguish this post from one arguing “one advantage of looking for my keys closer to a streetlight is that there’s more light!” I.e. it’s hard to know the extent to which these benefits come specifically from focusing on less important things, and therefore are illusory, versus the extent to which you can decouple these benefits from the costs of being a “moderate”.
But (in the language of the post) both moderates and radicals are working in the epistemic domain not some unrelated domain. It’s not that moderates and radicals are trying to answer different questions (and the questions moderates are answering are epistemically easier like physics). There are some differences in the most relevant questions, but I don’t think this is a massive effect.
That seems totally wrong. Moderates are trying to answer questions like “what are some relatively cheap interventions that AI companies could implement to reduce risk assuming a low budget?” and “how can I cause AI companies to marginally increase that budget?” These questions are very different from—and much easier than—the ones the radicals are trying to answer, like “how can we radically change the governance of AI to prevent x-risk?”
Hmm, I think what I said was about half wrong and I want to retract my point.
That said, I think much of the relevant questions are overlapping (like, “how do we expect the future to generally go?”, “why/how is AI risky?”, “how fast will algorithmic progress go at various points?) and I interpret this post as just talking about the effect on epistemics around the overlapping questions (regardless of whether you’d expect moderates to mostly be working in domains with better feedback loops).
This isn’t that relevant for your main point, but I also think the biggest question for radicals in practice is mostly: How can we generate massive public/government support for radical action on AI?
It might not be disproof, but it would seem very relevant for readers to be aware of major failings of prominent moderates in the current environment e.g. when making choices about what strategies to enact or trust. (Probably you already agree with this.)
I agree with this in principle, but think that doing a good job of noting major failings of prominent moderates in the current environment would look very different than Eliezer’s comment and requires something stronger than just giving examples of some moderates which seem incorrect to Eliezer.
Another way to put this is that I think citing a small number of anecdotes in defense of a broader world view is a dangerous thing to do and not attaching this to the argument in the post is even more dangerous. I think it’s more dangerous when the description of the anecdotes is sneering and misleading. So, when using this epistemically dangerous tool, I think there is a higher burden of doing a good job which isn’t done here.
On the specifics here, I think Carlsmith’s report is unrepresentative for a bunch of reasons. I think Bioanchors is representative (though I don’t think it looks fucking nuts in retrospect).
This is putting aside the fact that this doesn’t engage with the arguments in the post at all beyond effectively reacting to the title.
The bioanchors post was released in 2020. I really wish that you bothered to get basic facts right when being so derisive about people’s work.
I also think it’s bad manners for you to criticize other people for making clear predictions given that you didn’t make such predictions publicly yourself.
I generally agree with some critique in the space, but I think Eliezer went on the record pretty clearly thinking that the bio-anchors report had timelines that were quite a bit too long:
I think in many cases such a critique would be justified, but like, IDK, I feel like in this case Eliezer has pretty clearly said things about his timelines expectations that count as a pretty unambiguous prediction. Like, we don’t know what exact year, but clearly the above implies a median of at least 2045, more like 2040. I think you clearly cannot fault Eliezer for “not having made predictions here”, though you can fault him for not making highly specific predictions (but IDK, “50% on AI substantially before 2050″ is a pretty unambiguous prediction).
FWIW, I think it is correct for Eliezer to be derisive about these works, instead of just politely disagreeing.
Long story short, derision is an important negative signal that something should not be cooperated with. Couching words politely is inherently a weakening of that signal. See here for more details of my model.
I do know that this is beside the point you’re making, but it feels to me like there is some resentment about that derision here.
If that’s a claim that Eliezer wants to make (I’m not sure if it is!) I think he should make it explicitly and ideally argue for it. Even just making it more explicit what the claim is would allow others to counter-argue the claim, rather than leaving it implicit and unargued.[1] I think it’s dangerous for people to defer to Eliezer about whether or not it’s worth engaging with people who disagree with him, which limits the usefulness of claims without arguments.
Also, aside on the general dynamics here. (Not commenting on Eliezer in particular.) You say “derision is an important negative signal that something should not be cooperated with”. That’s in the passive voice, more accurate would be “derision is an important negative signal where the speaker warns the listener to not cooperate with the target of derision”. That’s consistent with “the speaker cares about the listener and warns the listener that the target isn’t useful for the listener to cooperate with”. But it’s also consistent with e.g. “it would be in the speakers interest for the listener to not cooperate with the target, and the speaker is warning the listener that the speaker might deride/punish/exclude the listener if they cooperate with the target”. General derision mixes together all these signals, and some of them are decidedly anti-epistemic.
For example, if the claim is “these people aren’t worth engaging with”, I think there are pretty good counter-arguments even before you start digging into the object-level: The people having a track record of being willing to publicly engage on the topics of debate, of being willing to publicly change their mind, of being open enough to differing views to give MIRI millions of dollars back when MIRI was more cash-constrained than they are now, and understanding points that Eliezer think are important better than most people Eliezer actually spends time arguing with.
To be clear, I don’t particularly think that Eliezer does want to make this claim. It’s just one possible way that “don’t cooperate with” could cash out here, if your hypothesis is correct.
He has explicitly argued for it! He has written like a 10,000 word essay with lots of detailed critique:
https://www.lesswrong.com/posts/ax695frGJEzGxFBK4/biology-inspired-agi-timelines-the-trick-that-never-works
Adele: “Long story short, derision is an important negative signal that something should not be cooperated with”
Lukas: “If that’s a claim that Eliezer wants to make (I’m not sure if it is!) I think he should make it explicitly and ideally argue for it.”
Habryka: “He has explicitly argued for it”
What version of the claim “something should not be cooperated with” is present + argued-for in that post? I thought that post was about the object level. (Which IMO seems like a better thing to argue about. I was just responding to Adele’s comment.)
I don’t think he is (nor should be) signaling that engaging with people who disagree is not worth it!
Acknowledged that that is more accurate. I do not dispute that that people misuse derision and other status signals in lots of ways, but I think that this is more-or-less just a subtler form of lying/deception or coercion and not something inherently wrong with status. That is, I do not think you can have the same epistemic effect without being derisive in certain cases. Not that all derision is a good signal.
Ok. If you think it’s correct for Eliezer to be derisive, because he’s communicating the valuable information that something shouldn’t be “cooperated with”, can you say more specifically what that means? “Not engage” was speculation on my part, because that seemed like a salient way to not be cooperative in an epistemic conflict.
My read is that the cooperation he is against is with the narrative that AI-risk is not that important (because it’s too far away or weird or whatever). This indeed influences which sorts of agencies get funded, which is a key thing he is upset about here.
On the other hand, engaging with the arguments is cooperation at shared epistemics, which I’m sure he’s happy to coordinate with. Also, I think that if he thought that the arguments in question were coming from a genuine epistemic disagreement (and not motivated cognition of some form), he would (correctly) be less derisive. There is much more to be gained (in expectation) from engaging with an intellectually honest opponent than one with a bottom line.
Hm, I still don’t really understand what it means to be [against cooperation with the narrative that AI risk is not that important]. Beyond just believing that AI risk is important and acting accordingly. (A position that seems easy to state explicitly.)
Also: The people whose work is being derided definitely don’t agree with the narrative that “AI risk is not that important”. (They are and were working full-time to reduce AI risk because they think it’s extremely important.) If the derisiveness is being read as a signal that “AI risk is important” is a point of contention, then the derisiveness is misinforming people. Or if the derisiveness was supposed to communicate especially strong disapproval of any (mistaken) views that would directionally suggest that AI risk is less important than the author thinks: then that would just seems like soldier mindset (more harshly critizing views that push in directions you don’t like, holding goodness-of-the-argument constant), which seems much more likely to muddy the epistemic waters than to send important signals.
Yeah, those are good points… I think there is a conflict with the overall structure I’m describing, but I’m not modeling the details well apparently.
Thank you!
Except that Yudkowsky had actually made the predictions in public. However, he didn’t know in advance that the AIs would be trained as neural networks that are OOMs less efficient at keeping context[1] in mind. Other potential mispredictions are Yudkowsky’s cases for the possibility to greatly increase the capabilities starting from a human brain simulation[2] or to simulate a human brain working ~6 OOMs faster:
Yudkowsky’s case for a superfast human brain
T hefastest observed neurons fire 1000 times per second; the fastest axon fibers con duct signals at 150 meters/second, a half-millionth the speed of light; each synaptic op eration dissipates around 15,000 attojoules, which is more than a million times the ther modynamicminimumforirreversible computations at room temperature (kT300 ln(2) = 0003 attojoules per bit). It would be physically possible to build a brain that computed a million times as fast as a human brain, without shrinking the size, or running at lower temperatures, or invoking reversible computing or quantum computing. If a human mind were thus accelerated, a subjective year of thinking would be accomplished for ev ery 31 physical seconds in the outside world, and a millennium would fly by in eight and a half hours. Vinge (1993) referred to such sped-up minds as “weak superhumanity”: a mind that thinks like a human but much faster.
However, as Turchin points out in his book[3] written in Russian, simulating a human brain requires[4] just 1e15 FLOP/second, or less than 1e22 FLOP/month.
Turchin’s argument in Russian
Для создания ИИ необходимо, как минимум, наличие достаточно мощного компьютера. Сейчас самые мощные компьютеры имеют мощность порядка 1 петафлопа (10 операций с плавающей запятой в секунду). По некоторым оценкам, этого достаточно для эмуляции человеческого мозга, а значит, ИИ тоже мог бы работать на такой платформе. Сейчас такие компьютеры доступны только очень крупным организациям на ограниченное время. Однако закон Мура предполагает, что мощность компьютеров возрастёт за 10 лет примерно в 100 раз, т. е., мощность настольного компьютера возрастёт до уровня терафлопа, и понадобится только 1000 настольных компьютеров, объединённых в кластер, чтобы набрать нужный 1 петафлоп. Цена такого агрегата составит около миллиона долларов в нынешних ценах – сумма, доступная даже небольшой организации. Для этого достаточно реализовать уже почти готовые наработки в области многоядерности (некоторые фирмы уже сейчас предлагают чипы с 1024 процессорами ) и уменьшения размеров кремниевых элементов.
ChatGPT’s translation into English
To create AI, at the very least, a sufficiently powerful computer is required. Currently, the most powerful computers have a performance of about 1 petaflop (10¹⁵ floating-point operations per second). According to some estimates, this is enough to emulate the human brain, which means that AI could also run on such a platform. At present, such computers are available only to very large organizations for limited periods of time. However, Moore’s Law suggests that computer performance will increase roughly 100-fold over the next 10 years. That is, the performance of a desktop computer will reach the level of a teraflop, and only 1,000 desktop computers connected in a cluster would be needed to achieve the required 1 petaflop. The cost of such a system would be about one million dollars at today’s prices—a sum affordable even for a small organization. To achieve this, it is enough to implement the nearly completed developments in multicore technology (some companies are already offering chips with 1,024 processors) and in reducing the size of silicon elements.
My take at the issues can be found in collapsible sections here and here.
A case against the existence of an architecture more efficient than a human brain is found in Jacob Cannel’s post. But it doesn’t exclude a human brain trained for millions of years.
Unfortunately, the book’s official translation into English has too low quality .
Fortunately, the simulation requires OOMs more dynamic memory.
IMO, there’s another major misprediction, and I’d argue that we don’t even need LLMs to make it a misprediction, and this is the prediction that within a few days/weeks/months we go from AI that was almost totally incapable of intellectual work to AI that can overpower humanity.
This comment also describes what I’m talking about:
How takeoff used to be viewed as occuring in days, weeks or months from being a cow to being able to place ringworlds around stars:
(Yes, the Village Idiot to Einstein post also emphasized the vastness of the space above us, which is what Adam Scholl claimed and I basically agree with this claim, the issue is that there’s another claim that’s also being made).
The basic reason for this misprediction is as it turns out, human variability is pretty wide, and the fact that human brains are very similar is basically no evidence (I was being stupid about this in 2022):
The range of human intelligence is wide, actually.
And also, no domain has actually had a takeoff as fast as Eliezer Yudkowsky thought in either the Village Idiot to Einstein picture or his own predictions, but Ryan Greenblatt and David Matolcsi already made them, so I merely need to link them (1, 2, 3).
Also, a side note is that I disagree with Jacob Cannell’s post, and the reasons are that it’s not actually valid to compare brain FLOPs to computer FLOPs in the way Jacob Cannell does:
Why it’s not valid to compare brain FLOPs to computer FLOPs in the way Jacob Cannell does, part 1
Why it’s not valid to compare brain FLOPs to computer FLOPs in the way Jacob Cannell does, part 2
I generally expect it to be 4 OOMs at least better, which cashes out to at least 3e19 FLOPs per Joule:
The limits of chip progress/physical compute in a small area assuming we are limited to irreversible computation
(Yes, I’m doing a lot of linking because other people have already done the work, I just want to share the work rather than redo things all over again).
@StanislavKrym I’m tagging you since I significantly edited the comment.
This is not a response to your central point, but I feel like you somewhat unfairly criticize EAs for stuff like bioanchors often. You often say stuff that makes it seem like bioanchors was released, all EAs bought it wholesale, bioanchors shows we can be confident AI won’t arrive before 2040 or something, and thus all EAs were convinced we don’t need to worry much about AI for a few decades.
But like, I consider myself and EA, I never put much weight on bioanchors. I read the report and found it interesting, I think its useful enough (mostly as a datapoint for other arguments you might make) that I don’t think was a waste of time. But not much more than that. It didn’t really change my views on what should be done. Or the likelihood of AGI being developed at which points in time except on the margins. I mean thats how most people I know read that report. But I feel like you accuse people involved of having far less humility and masking way stronger stronger claims than they are.
Notably, bioanchors doesn’t say that we should be confident AI won’t arrive before 2040! Here’s Ajeya’s distribution in the report (which was finished in about July 2020).
Yeah, to be clear, I don’t think that, and I think most people didn’t think that, but Eliezer has sometimes said stuff that made it seem like he thought people think that. I was remembering a quote from 2:49:00 at this podcast
Indicating bioanchors make a stronger statement than it is, and that EAs are much more dogmatic about that report than most EAs are. Although to be fair, he did say probably here.
Upvote-disagree. I think you’re missing an understanding of how influential it was in OpenPhil circles, and how politically controlling OpenPhil has been of EA.
This seems very wrong to me from my experience in 2022 (though maybe the situation was very different in 2021? Or maybe there is some other social circle that I wasn’t exposed to which had these properties?).
Which claim?
I think williawa’s characterization of how people reacted to bioanchors basically matches my experience and I’m skeptical of the claim that OpenPhil was very politically controlling of EA with respect to timelines.
And, I agree with the claim that that Eliezer often implies people interpreted bioanchors in some way they didn’t. (I also think bioanchors looks pretty reasonable in retrospect, but this is a separate claim.)
OpenPhil was on the board of CEA and fired it’s Executive Director and to this day has never said why; it made demands about who was allowed to have power inside of the Atlas Fellowship and who was allowed to teach there; it would fund MIRI by 1/3rd the full amount for (explicitly stated) signaling reasons; in most cases it was not be open about why it would or wouldn’t grant things (often even with grantees!) that left me just having to use my sense of ‘fashion’ to predict who would get grants and how much; I’ve heard rumors I put credence on that it wouldn’t fund AI advocacy stuff in order to stay in the good books of the AI labs… there was really a lot of opaque politicking by OpenPhil, that would of course have a big effect on how people were comfortable behaving and thinking around AI!
It’s silly to think that a politically controlling entity would have to punish ppl for stepping out of line with one particular thing, in order for people to conform on that particular thing. Many people will compliment a dictator’s clothes even when he didn’t specifically ask for that.
My core argument in this post isn’t really relevant to anything that was happening in 2020, because people weren’t really pushing on concrete changes to safety practices at AI companies yet.
My guess is still that calling folks “moderates” and “radicals” rather some more specific name (perhaps “marginalists” and “reformers”) is a mistake and fits naturally into conversations about things it seems you didn’t want to be talking about.
Relatedly, calling a group of people ‘radicals’ seems straightforwardly out-grouping to me, I can’t think of a time where I would be happy to be labeled a radical, so it feels a bit like tilting the playing field.
Agreed. Also, I think the word “radical” smuggles in assumptions about the risk, namely that it’s been overestimated. Like, I’d guess that few people would think of stopping AI as “radical” if it was widely agreed that it was about to kill everyone, regardless of how much immediate political change it required. Such that the term ends up connoting something like “an incorrect assessment of how bad the situation is.”
Proof that Yampolsky “loves being an extremist”, as opposed to just stating what he honestly believes???
This is a good contribution (strong upvote), but this definition of ‘moderate’ also refers to not attempting to cause major changes within the company. Otherwise I think many of these points do not apply; within big companies if you want major change/reform you will often have to engage in some amount of coalitional politics, you will often have incentive to appear very confident, if your coalition is given a bunch of power then you often will not actually be forced to know a lot about a domain before you can start acting in it, etc.
I wonder if the distinction being drawn here is better captured by the names “Marginalists” vs “Revolutionaries”.
In the leftist political sphere, this distinction is captured by the names “reformers” vs “revolutionaries”, and the argument about which approach to take has been going on forever.
I believe it would be worthwhile for us to look at some of those arguments and see if previous thinkers have new (to us) perspectives that can be informative about AI safety approaches.
Interestingly I think some of the points still apply in that setting: people in companies care about technical details so to be persuasive you will have to be familiar with them, and in general the points about the benefits of an informed audience seem like they’re going to apply.
My guess is there are two axes here:
technocratic vs democratic approaches (where technocratic approaches mean you’re talking to people informed about details, while democratic approaches mean your audience is less informed about details but maybe has a better sense of wider impacts)
large-scale vs small-scale bids (where large-scale bids are maybe more likely to require ideologically diverse coalitions, while small-scale bids have less of an aggregate impact)
Big changes within companies are typically bottlenecked much more by coalitional politics than knowledge of technical details.
Sure, but I bet that’s because in fact people are usually attuned to the technical details. I imagine if you were really bad on the technical details, that would become a bigger bottleneck.
[Epistemic status: I have never really worked at a big company and Richard has. I have been a PhD student at UC Berkeley but I don’t think that counts]
I think one effect you’re missing is that the big changes are precisely the ones that tend to mostly rely on factors that are hard to specify important technical details about. E.g. “should we move our headquarters to London” or “should we replace the CEO” or “should we change our mission statement” are mostly going to be driven by coalitional politics + high-level intuitions and arguments. Whereas “should we do X training run or Y training run” are more amenable to technical discussion, but also have less lasting effects.
Do you not think it’s a problem that big-picture decisions can be blocked by a kind of overly-strong demand for rigor from people who are used to mostly think about technical details?
I sometimes notice something roughly like the following dynamic:
1. Person A is trying to make a big-picture claim (e.g. that ASI could lead to extinction) that cannot be argued for purely in terms of robust technical details (since we don’t have ASI yet to run experiments, and don’t have a theory yet),
2. Person B is more used to think about technical details that allow you to make robust but way more limited conclusions.
3. B finds some detail in A’s argument that is unjustified or isn’t exactly right, or even just might be wrong.
4. A thinks the detail really won’t change the conclusion, and thinks this just misses the point, but doesn’t want to spend time, because getting all the details exactly right would take maybe a decade.
5. B concludes A doesn’t know what they’re talking about and continues ignoring the big picture question completely and keeps focusing on more limited questions.
6. The issue ends up ignored.
It seems to me that this dynamic is part of the coalitional politics and how the high-level arguments are received?
Yes, that can be a problem. I’m not sure why you think that’s in tension with my comment though.
I don’t think it’s *contradicting* it but I vaguely thought maybe it’s in tension with:
Because lack of knowledge of technical details by A ends up getting B to reject and oppose A.
Mostly I wasn’t trying to push against you though, and more trying to download part of your model on how important you think this is, out of curiosity, given your experience at OA.
A key crux is I don’t generally agree with this claim in AI safety:
In this specific instance, it could work, but in general I think ignoring details is a core failure mode of people that tend towards abstract/meta stuff, which is absolutely the case on Lesswrong.
I think abstraction/meta/theoretical work is useful, but also that theory absolutely does require empirics to make sure you are focusing on the relevant parts of the problem.
This especially is the case if you are focused on working on solutions, rather than trying to get attention on a problem.
I’ll just quote from Richard Ngo here, because he made the point shorter than I can (it’s in a specific setting, but the general point holds):
But the problem is that we likely don’t have time to flesh out all the details or do all the relevant experiments before it might be too late, and governments need to understand that based on arguments that therefore cannot possibly rely on everything being fleshed out.
Of course I want people to gather as much important empirical evidence and concrete detailed theory as possible asap.
Also, the pre-everything-worked-out-in-detail arguments also need to inform which experiments are done, and so that is why people who have actually listened to those pre-detailed arguments end up on average doing much more relevant empirical work IMO.
This comment articulates the main thought I was having reading this post. I wonder how Buck is avoiding this very trap, and if there is any hope at all of the Moderate strategy overcoming this problem?
I guess there’s also “do you expect to have enough time with your audience for you to develop an argument and them to notice flaws”, which I think correlates with technocratic vs democratic but isn’t the same.
I think that this friendliness has its own very large epistemic effects. The better you know people, the more time you spend with them, and the friendlier you are with them, the more cognitive dissonance you have to overcome in order to see them as doing bad things, and especially to see them as bad people (in the sense of their actions being net harmful for the world). This seems like the most fundamental force behind regulatory capture (although of course there are other factors there like the prospect of later getting industry jobs).
You may be meaning to implicitly recognize this dynamic in the quote above; it’s not clear to me either way. But I think it’s worth explicitly calling out as having a strong countervailing epistemic impact. I’m sure it varies significantly across people (maybe it’s roughly proportional to agreeability in the OCEAN sense?), and it may not have a large impact on you personally, but for people to weigh the epistemic value of working as a moderate, it’s important that they consider this effect.
A related phenomenon: Right-leaning Supreme Court justices move left as they get older, possibly because they’re in a left-leaning part of the country (DC) and that’s where all their friends are.
[…]
This is a significant effect in general, but I’m not sure how much epistemic cost it creates in this situation. Moderates working with AI companies mostly interact with safety researchers, who are not generally doing bad things. There may be a weaker second-order effect where the safety researchers at labs have some epistemic distortion from cooperating with capabilities efforts, and this can influence external people who are collaborating with them.
Fair point, that does seem like a moderating (heh) factor.
FWIW I’m the primary organizer of PauseAI UK and I’ve thought about this a lot.
I agree with Bucks statement.
However, I also feel that like, for the last 10 years, the reverse facet of that point has been argued all the time ad nauseam, both between people on lw, and as criticism coming from outside the community. “People on lesswrong care about epistemic purity, and they will therefore never ever get anything done in the real world. Its easy to have pure epistemics if you’re just sitting with your friends thinking about philosophy. If lw really cared about saving the world they would stop with ‘politics is mindkiller’ and ‘scout mindset’ and start actually trying to win.”.
And I think that criticism has some validity. “The right amount of politics is not zero, even though it really is the mind killer”.
But I also think, arguments for taking AI x-risk very seriously, are unusually strong compared with most political debates. Its an argument we should be able to win even speaking only speaking the whole truth. And in some sense, it can’t become an “ordinary” political issue, then the action will not be swift and decisive enough. And if people start making a lot of, even if not false, misleading statements, the risk of that becomes very high.
Has this succeeded? And if so, do you have specific, concrete examples you can speak about publicly that illustrate this?
Mostly AI companies researching AI control and planning to some extent to adopt it (e.g. see the GDM safety plan).
Mostly unfiltered blurting
Counterfactual?
Control is super obvious and not new conceptually; rather it’s a bit new that someone is actually trying to do the faffy thing of making something maybe work. I think it’s pretty likely they’d be doing it anyway?
Counterpoint: companies as group actors (in spite of intelligent and even caring constituent humans) are mostly myopic and cut as many corners as possible by default (either due to vicious leadership, corporate myopia, or (perceived) race incentives), so maybe even super obvious things get skipped without external parties picking up the slack?
The same debate could perhaps be had about dangerous capability evaluations.
Even though the basic ideas are kind of obvious, I think that us thinking them through and pushing on them has made a big difference in what companies are planning to do.
I doubt this is what Buck had in mind, but he’s had meaningful influence on the various ways I’ve changed my views on the big picture of interp over time
Seems like a huge point here is ability to speak unfiltered about AI companies? The Radicals working outside of AI labs would be free to speak candidly while the Moderates would have some kind of relationship to maintain.
I agree this is a real thing.
Note that this is more important for group epistemics than individual epistemics.
Also, one reason the constraint isn’t that bad for me is that I can and do say spicy stuff privately, including in pretty large private groups. (And you can get away with fairly spicy stuff stated publicly.)
This strikes me as a fairly strong strawman. My guess if the vast majority of thoughtful radicals basically have a similar view to you. Indeed, at least from your description, its plausible my view is more charitable than yours—I think a lot of it is also endangering humanity due to cowardice and following of local incentives etc.
Note that I think something like this describes a lot of people working in AI risk policy, and therefore seems like more than a theoretical possibility.
This sounds like the streetlight bias, superficially? Just because your audience is intelligent and knowledgeable, doesn’t mean it’s the right audience, and doesn’t mean the stance that led you to them is correct.
Isn’t this a disadvantage? If third-parties that disagree with you, were able to criticize spicy things you say, and possibly counter-persuade people from AI companies, you would have to be even more careful.
That leads to things like corporate speak that is completely empty of content. The critics are adversarial or misunderstand you, so the incentives are very off
Avoiding what you suggested is why private conversations are an advantage. I think you misunderstood the essay, unless I’m misunderstanding your response.
I really like this post.
There’s also a “moderates vs radicals” when it comes to attitudes, certainty in one’s assumptions, and epistemics, rather than (currently-)favored policies. While some of the benefits you list are hard to get for people who are putting their weight behind interventions to bring about radical change, a lot of the listed benefits fit the theme of”keeping good incentives for your epistemics,” and so they might apply more broadly. E.g., we can imagine someone who is “moderate” in their attitudes, certainty in their assumptions, etc., but might still (if pressed) think that radical change is probably warranted.
For illusration, imagine I donate to Pause AI (or joined one of their protests with one of the more uncontroversial protest signs), but I still care a lot about what the informed people who are convinced of Anthropic’s strategy have to say. Imagine I don’t think they’re obviously unreasonable, I try to pass their Ideological Turing test, I care about whether they consider me well-informed, etc. If those conditions are met, then I might still retain some of the benefits you list.
What about the converse, the strategy for bringing about large and expensive changes? You not discussing that part makes it seem like you might agree with a picture where the way to attempt large and expensive changes is always to appeal to a mass audience (who will be comparatively uninformed). However, I think it’s it at least worth considering that promising ways towards pausing the AI race (or some other types of large-scale change) could go through convincing Anthropic’s leadership of problems in their strategy (or or more generally: through convincing some other powerful group of subject matter experts). To summarize whether radical change goes through mass advocacy and virality vs convincing specific highly-informed groups and experts, seems like somewhat of an open question and might depend on the specifics.
I think looking for immediately-applicable changes which are relevant to concrete things that people at companies are doing today need not constrain you to small changes, and so I would not use the words you’re using, since they seem like a bad basis space for talking about the moving parts involved. I agree that people who want larger changes would get better feedback, and end up with more actionable plans, if they think in terms of what change is actually implementable using the parts available at hand to people who are thinking on the frontier of making things happen.
I think there is a fair amount of overlap between the epistemic advantages of being a moderate (seeking incremental change from AI companies) and the epistemic disadvantages.
Many of the epistemic advantages come from being more grounded or having tighter feedback loops. If you’re trying to do the moderate reformer thing, you need to justify yourself to well-informed people who work at AI companies, you’ll get pushback from them, you’re trying to get through to them.
But those feedback loops are with that reality as interpreted by people at AI companies. So, to some degree, your thinking will get shaped to resemble their thinking. Those feedback loops will guide you towards relying on assumptions that they see as not requiring justification, using framings that resonate with them, accepting constraints that they see as binding, etc. Which will tend to lead to seeing the problem and the landscape from something more like their perspective, sharing their biases & blindspots, etc.
I think that to many in AI labs, the control agenda (in full ambition) is seen as radical (it’s all relative) and to best persuade people that it’s worth pursuing rigorously, you do in fact need to engage in coalitional politics and anything else that increases your chances of persuasion. The fact that you feel like your current path doesn’t imply doing this makes me more pessimistic about your success.
This is an anonymous account but I’ve met you several times and seen you in action at AI policy events, and I think those data points confirm my view above.
Curated, this helpfully pointed out some important dynamics in the discourse that I think are present but have never been quite made explicit. As per the epistemic notice, I think this post was likely quickly written and isn’t intended to reflect the platonic ideal set of points on this, but I stand behind it as being illuminating on an important subject and worth sharing.
Indeed it was quickly written, I think it was like 20 mins of writing and 30 mins of editing/responding to comments. Based on feedback, I think it was probably a mistake to not put more time into it and post a better version.
Would you prefer me to pause the curation for a day while you do that? We have 10-20 mins before it gets emailed out to ~30k ppl.
Yes, I’d appreciate that. DM me on slack?
I want to pull out one particular benefit that I think swamps the rest of the benefits, and in particular explains why I tend to gravitate to moderation over extremism/radicalism:
Caring about the real world details of a problem is often quite important in devising a good solution, and is arguably the reason why moderates in politics generally achieve their personal goals more than radicals.
Rationalists/EAs are generally better at this than most people, due to decoupling norms being more accepted, but there is a real problem when people forget that the real-life details of AI actually matter in designing solutions to the AI alignment problem.
Richard Ngo has talked before about how Eliezer’s intuitions on this topic is similar to a mathematician’s intuitions on a theorem, but where his choice to be abstract and avoid details pretty much blocks solutions to the problem, and while this is less prevalent than in 2018, it does still linger (AI control is probably the paradigmatic example of an indirect solution to the alignment problem that depends on details of how much AI is capable of).
Similarly, Eliezer’s That Alien Message and Einstein’s Speed definitely has a vibe that you can reasonably expect to ignore empirical details and still be right by pure algorithmic intelligence.
(Though at least for That Alien Message, there’s substantially more computation than humans usually do):
https://x.com/davidad/status/1841959485365223606
That doesn’t mean abstraction is useless, but it does mean we have to engage in real-world details if we want to solve problems.
Thanks for writing this. I’m not sure I’d call your beliefs moderate, since they involve extracting useful labor from misaligned AI by making deals with them, sometimes for pieces of the observable universe or by verifying with future tech.
On the point of “talking to AI companies”, I think this would be a healthy part of any attempted change although I see that PauseAI and other orgs tend to talk to AI companies in a way that seems to try to make them feel bad by directly stating that what they are doing is wrong. Maybe the line here is “You make sure that what you say will still result in you getting invited to conferences” which is reasonable but I don’t think that talking to AI companies gets at the difference between you and other forms of activism.
Very glad of this post. Thanks for broaching, Buck.
Status: I’m an old nerd, lately ML R&D, who dropped career and changed wheelhouse to volunteer at Pause AI.
Two comments on the OP:
As per Joseph’s response: this does not match me or my general experience of AI safety activism.
Concretely, a recent campaign was specifically about Deep Mind breaking particular voluntary testing commitments, with consideration of how staff would feel.
I just cannot do this myself.
(There is some amount of it around, but also it is not without value. See later.)
Gideon F:
Reporting from inside: I rate it a good guess, especially when you weight by “thoughtful”.
Anthony feels seen / imagined.
Some for sure. The important one I noticed struggling to get is engaged two-way conversation with frontier lab folk. Trade-off.
Back to faceless companies: some activists, including thoughtful ones, are more angry than me. (Anthropic tend to be a litmus test. Which is fun given their pH variance week to week.)
Exasperated steel man: these lab folk are externalizing the costs of their own risk models and tolerances without any consent. This doesn’t seem very epistemically humble. But I get that the virtue math is fragile and so I feel sympathy and empathy for many parties here.
Still, for both emotional health of the activists and odds of public impact, radicals helping each other feel some aggravated anger does seem sane. In this regard as others, I find there are worthwhile things to learn and eval from the experience of campaigners who were never in EA on LessWrong.
I’ll risk another quote without huge development—williawa:
For me: well-phrased, then insightful.
Lastly, Kaleb:
and Lukas again:
My response to both of these is pretty “porque no los dos”. This is not zero sum. Let us apply disjunctive effort.
It is even the case that a “pincer movement” helps: a radical flank primes an audience for moderate persuasion. (This isn’t my driver: of course I express my real position. But it makes me less worried about harm if I’m on the wrong side.)
Thank you for writing this up, I was glad to reflect on it.
I think there might be a confusion between optimizing for an instrumental vs. an upper-level goal. Is maintaining good epistemics more relevant than working on the right topic? To me the rigor of an inquiry seems secondary to choosing the right subject.
Consider the following numbered points:
In an important sense, other people (and culture) characterize me as perhaps moderate (or something else). I could be right, wrong, anything in between, or not even wrong. I get labeled largely based on what others think and say of me.
How do I decide on my policy positions? One could make a pretty compelling argument (from rationality, broadly speaking) that my best assessments of the world should determine my policy positions.
Therefore, to the extent I do a good job of #2, I should end up recommending policies that I think will accomplish my desired goals even when accounting for how I will be perceived (#1).
This (obvious?) framework, executed well, might subsume various common (even clichéd) advice that gets thrown around:
Be yourself and do what needs to be done, then let the cards fall as they may.
No one will take your advice if you are perceived as crazy.
Many movements are born by passionate people perceived as “extreme” because important issues are often polarizing.
It can be difficult to rally people around a position that feels watered down.
Pick something doable and execute well to build momentum for the next harder thing.
Writing legislation can be an awful slog. Whipping votes requires a lot of negotiation, some unsavory. But all this depends on years of intellectual and cultural groundwork that softened the ground for the key ideas.
P.S. when I first came here to write this comment, I had only a rough feeling along the lines of “shouldn’t I choose my policy positions based on what I think will actually work and not worry about how I’m perceived?” But I chewed on it for a while. I hope this is a better contribution to the discussion, because I think it is quite a messy space to figure out.