I don’t especially agree with the argument presented as a good reason to not advocate for a stop. But what you wrote is not a plausible summary of the case as I presented it. It’s claiming that cooption is likely and bad, including from an X-risk perspective. (I personally don’t view it as so likely and so bad that the expected costs outweigh the benefit of supported a global stop on AGI research.)
TsviBT
Here is a concern (which I’m paraphrasing/copying from others in a group chat, not my own understanding). (I should note, for broad epistemic hygiene, that this group chat seems to have a lot of hostility toward “stop AGI research” stuff in general, so it could be an idea that doesn’t actually hold water and is designed to sound plausible but be more demoralizing than helpful; but it seems prima facie interesting & important to me.)
These plans about social support for global AGI bans involve creating and coalescing a big blob of social/political will. The actors who are currently trying to create that political will are not highly competent political actors. In general, when there is a big blob of political will not already controlled/wielded by highly competent political actors, that blob is likely to soon come under the control of highly competent political actors. There’s no strong reason those coopters would be at all desirable as wielders of the blob, according to the original creators of the blob.
This has happened before. A source of examples is the Progressive Era. One example is the Interstate Commerce Commission: it was supposed to protect the people’s interests against the railroad companies exercising political power, but maybe kinda ended up being used by the railroads to suppress competition.
In particular, the broad “stop AI” movement is gathering a big blob of political will. What are they gathering political will towards? Many of them are hoping to make a blob for “international treaty to prevent AGI research”, and they are saying that’s what they’re aiming for, and in the mental sense they are trying to do so. But in the likely-actual-results sense, they aren’t working towards that. What they are actually working towards is [whatever is desired by the highly competent political actors who will coopt and wield the blob].
More concretely, imagine that AI companies make some sort of sham consortium of “industry self-regulation”. Then, they get that consortium included in the “is this AI research allowed” panel. Then, they disallow everyone else’s research, but keep doing their own research. Even if some sort of international treaty goes through, one way or another, perhaps in the name of continuing “AI alignment research”, these forces get exceptions put in the international treaty, with nice words like “international collaborative AI safety commission special exception safe allowed safety research”. Then they get to suppress everyone else and continue doing omnicide-intelligence-explosion research relatively undisturbed. This is bad in a bunch of ways (unfair, and supports unscrupulous national actors), and doesn’t prevent AGI X-risk. If it’s not a consortium, maybe just individual AI companies do that. Or EAs with deep Anthropic ties do that. Or anti-US forces coopt the blob to hurt US AI.
This is maybe kinda already happening. Is there foreign money propping up a social movement against the American AI industry? Did Open Philanthropy and OpenAI team up to do AI regulatory capture or something? What’s your
?Even if you keep saying “don’t stop US AI, stop all AI globally with a global treaty”, that doesn’t mean that the broader group of people who constitute the body of political will you have gathered will be making those distinctions. A highly competent political actor could lead them—to somewhere that you would not have led them, but where they don’t especially mind being led, at least as revealed by their actual behavior. If you don’t have a plan for pointing your blob at an international ban on AGI research and not having someone else point it at other things, then maybe you shouldn’t make the blob. Do not conquer what you cannot defend.
Curious if you have thoughts about this.
Thanks!! Happy to be seeing this work.
Public opinion is low-leverage on a decision like this.
Even a priori, I’d say that due to uncertainty about how much this is true, and due to the multiple pathways that organized public opinion can help, and due to general principles of it being good to be organized and state beliefs clearly, a large organized social movement is an important point of intervention. Cf. also “Every point of intervention”.
A point someone made to me is something like (in my own words):
One key aspect of the USA civil rights movement was legible authoritative movement leaders. Political leaders knew who they could call up to get authoritative info on what demands the movement is making, such that satisfying those demands would avert antagonistic protests and would gain political support from the movement.
I think you sort of address this, but partly by punting to policy orgs (which makes a lot of sense and seems correct, TBC), and partly by policies that are somewhat internal (?), e.g. internally routing certain things up the hierarchy. I didn’t read the post fully, so maybe you directly addressed this, but the concrete suggestion would be “make the authoritative sources of demands be very externally legible”.
Kinda? It’s not really about stages. It’s just granularity of selection, period. For example, 1-stage chromosome selection, separately on two people’s gametes, is more powerful than 2- or 3-stage iterated embryo selection with realistic numbers, probably.
I think many people overrate how politically salient AI is.
I would say that much of my intuition that a pause (yes, driven by various national governments, and public desire) is plausibly doable comes from “the trajectory” of sentiments, rather than the total amount. I agree as a fraction of total political discourse it’s quite small.
Anti-AI sentiment is all over the place, but I think its a mile wide and an inch deep
This vaguely matches my impression for the most part, in the sense that if you look at discussion on AI in general, much of it may be negative but most of that isn’t people who deeply worry + care about x-risk. But to refine the picture, it’s mile wide, inch deep, but getting full of holes: it seems like there’s an ever growing number of people, including higher ups both in AI and also in government, who seem to take x-risk seriously, largely in words but also in more meaningful ways like drafting bills and stuff like that.
I think any politicians that did serious damage to the U.S. economy and potentially started wars to pause AI would be electorally punished.
Pardon my stupid question, but what goes wrong concretely? If you ban AGI, but let people keep running existing LLMs (say), does this really cause big and legible enough economic damage that voters would actually move? I mean, I’d think there’s lots of economically damaging things that don’t get credit-assigned into much actual vote shifts.
My concern is that a weak pause drives AI development underground, differentially hurts safety, and doesn’t allow people to update in the direction of a real pause. Like a think a world where AI development is nominally illegal, but the Chinese and U.S. Governments both had well funded secret programs is much worse than Evan’s proposal and likely worse than the status quo.
I may bow out, and feel free to take the maybe last, but I’ll just note that this still doesn’t make sense to me basically at all. Like, yes, there’s kinda-plausible scenarios where a global pause surprisingly ends up worse. But it would still be surprising, right? Like, there’s probably less human cloning right now, compared to the counterfactual where it wasn’t banned, right?? Someone could go underground with it, but that’s really hard and takes work!
I agree that slightly/somewhat unprecedentedly much access (official or espionage) might have to be somehow granted + enforced for treaty implementation. Maybe this is a strong defeater, I just don’t see it. Like, I agree it seems potentially kinda hard or quite hard in some scenarios. But this “global ban is actually worse than hugely resources companies going full tilt or even slightly less full tilt because of a “slowdown”″ seems galaxy-brained and false.
I’d also point out that a treaty isn’t a thing that happens, and then whatever’s written in the treaty determines how well the treaty helps the situation. A treaty is a step in a broader process that can continue to adapt and develop to avoid the dangerous stuff happening. Cf. https://www.lesswrong.com/posts/Sdrzo7z3STzdrnwKW/what-exactly-would-an-international-ai-treaty-say-is-a-bad
people and governments taking AGI very seriously, something that is not the case right now and I believe is unlikely to become the case under a pause.
By a “pause”, the main thing I mean is “an international treaty to stop AGI globally”. I assume you’d think that’s unlikely because it’s unlikely that people + gvts would take it seriously enough. I don’t want to make a strong claim about it being likely feasible, and presumably even if it’s doable it would be a huge amount of hard work. But are you strongly claiming that it’s infeasible? If so, that’s the position I’d like to understand—why do you think that, if you do? Is this a case that’s been worked out and explained somewhere? Has it been debated seriously?
we are very close to RSI/AGI/ASI in which case I think a global pause does not slow progress enough (due to enforcement issues)
I mean, I agree that it’s kinda hard, but if AGI is very close, it probably involves lots of compute, right? Big piles of compute seem plausibly regulable. That doesn’t seem like an infeasible enforcement issue.
we are not very close to RSI/AGI/ASI in which case we should hold off on a hard pause because the case for it will be stronger in the future. …. although I do worry that a stop would lower the salience of the issue.
Neither of these arguments make sense to me, and seem quite opposite the truth. Like, we should stop ASAP so that we’re not in a terrible time crunch, right?
Both of those posts have the form of “what would an agreement say” which I think is totally missing the hard part. So I think that points at the answer to your original question, and why others regard it as obvious and you do not.
You might be overinferring what I think these blog posts indicate? I’m just gesturing that I agree that the overall project of figuring out how the whole thing might be feasible is a worthy project.
The answer is “because there’s no political will”.
I know that this is a thing people say, and I agree there isn’t already automatically political will pre-gathered. But if the implication is that it would be an infeasible task to create and gather the political will for a global stop, that implication is one I strongly question! And so far I hear lots of signs pointing in the opposite direction, and grateful to the people working on that. I just wish that Anthropic would support those efforts.
WRT the Anthropic office visits: This has the general form of “it’s their fault not ours” which is suspicious.
Not blaming, describing. Can’t survive without describing.
(Anyway, just FYI, your time might be somewhat wasted if you want to get me on board with a particular approach / stance, because I’m much more commenting from the sidelines rather than an active participant; I’m focusing on other things, while others are actually working on communicating with the public and political leaders and so on.)
But I have heard people from the developer side of the fence say that they find LW a hostile environment and have trouble engaging here even though they feel they should. And the tone of discussions here certainly look like tribal dynamics and polarization are happening.
This makes sense. I will note however that when I (one time) asked an Anthropic employee about inviting someone over to their offices to explain / argue more in depth some crucial point (I forget; I think alignment difficulty), they said something like “last one or two times we tried that, the guest was dismissive”. So like, it looks a whole lot more like the crux is self-insulation, even if there is also undue hostility on LW. But, that is N=1. (I have other Ns that look like self-insulation, though of course that’s almost inherently ambiguous and my total N is small.)
(I do think in past I’ve at least watched from the sidelines, or even slightly participated in, arguably-undue dogpiley polite arguing, if not hostility.)
I won’t dive into that further right now, but I think it is a worthy collaborative project for LWers.
Definitely agree. (Cf. https://www.lesswrong.com/posts/Sdrzo7z3STzdrnwKW/what-exactly-would-an-international-ai-treaty-say-is-a-bad and https://www.lesswrong.com/posts/X9Z9vdG7kEFTBkA6h/what-could-a-policy-banning-agi-look-like )
I agree re/ networks (https://tsvibt.blogspot.com/2022/09/dangers-of-deferrence.html). I also agree with polarization being bad. However:
Often I think “fear of polarization” ends up making the person not able to ask tough questions at all; and sometimes they end up straight up going over to work on bad stuff.
People working on bad stuff are absolutely taking advantage of orientations like “fear of polarization”. (Not everyone, and it’s a mix of sympathetic / unsympathetic, intentional / unintentional; but still happening.) For example, I suspect this is a primary enabler of self-deception—being unclear about what’s wrong about someone’s beliefs or actions.
I think this means something like, you strongly strongly criticize the behavior, but not demonize the person. I have for example said this: https://www.lesswrong.com/posts/CYTwRZtrhHuYf7QYu/a-case-for-courage-when-speaking-of-ai-danger?commentId=pLH6dxnTrTz56BQYj
I’m curious how else we-broadly can go about this better.
Framing it as you did without context looks somewhat combative to my eye,
I usually have a pretty bad intuitive reaction to people telling me that my writing is “too combative”, but I don’t know why fully, and in a small fraction of the cases I end up agreeing with them later, so I won’t respond further. I do want to say what I just said, though; not sure why, but maybe, for example, I feel that it’s somehow unfair, though since I can’t explain how this is a very unreliable sense.
and also seems like an isolated demand for rigor
I think it’s a really important and central supposition, and apparently has not been rigorously defended anywhere! This is 100% definitely not what the concept of “isolated demand for rigor” is for! What other such things am I allegedly not demanding rigor for, that I ought to be? Or do you disagree / am I confused somehow?
So I stand by my initial assessment of “concerning and probably more harm than good”.
Ok. (If you wanted to update me personally, you haven’t done that on this point.)
Showing proof of effort by laying out the reasons you’re asking make it clear that you’re not trolling.
Until this point, I have been genuinely unsure if there’s simply a report / blog post / something that someone might just link me to, explaining the case!! But yes, I think you’re right, I now agree it would be better for a comment like my first to come with a few sentences explaining that it seems like there isn’t such a case, that a global stop seems plausibly feasible, that Anthropic seems super far from appropriately supporting that, and that they should.
Ah ok here’s something that’s implicit, that I can now make explicit: IF you’re taking big actions (e.g. Anthropic doing frontier AGI research) AND your justifications for doing that rely on a quite unclear claim X, THEN you should have reasoned out X pretty well and defend it publicly OR ELSE you probably are deceiving yourself and/or others about why you’re doing what you’re doing. (There’s plenty of exceptions, but still.)
Sorry, I wasn’t fully explicit. When I said “Advocate for a global stop on AGI research”, I meant for us (broadly, including Evan, say) to advocate to world governments that they should institute a global stop, not that anyone’s supposed to persuade labs to stop. (I mean, separately I’d like to push on many points of intervention, including persuading AGI researchers to stop it.)
I still think your questions may include a degree of socratic trolling I think is ultimately unhelpful
I’m really not. Not sure why you think that. I can try to lay more things out, though I think I’ve indicated some / most of this? Anyway:
I think it’s plausible that a global stop could be coordinated. I’m a layman, so I can’t, like, argue for this super well or coherently.
I think that it’s bad for labs to be doing AGI research.
Anthropic seems to justify pushing the frontier by saying a global pause is probably not doable.
I have literally never heard a serious coherent case laid out that this is not doable. It’s just something that it’s embarrassing if you don’t understand. But that’s garbage reasoning.
If I had to guess, I would guess that most of them (non-leadership Anthropic employees) broadly are just mindfucked somehow. However, I see plenty of bits of apparent good faith and good intentions, I know some of the people there and think they are smart and have or had good intentions, etc. Also, generally it seems rude to just assume there is no such case with checking. So, I have been trying to check by asking some Anthropic employees, and a bit of searching, and asking on LW. So far, no one has said “oh yeah here’s a link”, but people continue to argue or otherwise act as though it’s obvious. You tell me what I’m supposed to make of this! Maybe this is what comes off as trolling? Maybe we really should unban Said!
You can do alignment research without doing current-flavored capabilities research. For example, no one has the concepts needed to understand values, which would probably be needed to do alignment; this can, and probably must, be investigated conceptually.
An intervention has many effects and differentially affects many processes, not just capabilities and alignment. Cf. https://www.lesswrong.com/posts/K4K6ikQtHxcG49Tcn/hia-and-x-risk-part-2-why-it-hurts#An_ontology_of_effects_of_interventions_on_world_processes For example, a stop gives everyone more time to understand the problem better and figure out ways to continue stopping. It also gives more time for human intelligence amplification, and other ways that humanity can make itself more able to survive the possibility of AGI.
I suspect this may be the case, but I’d like to know how people think about their beliefs about this question, but the way Anthropic employees (n=3ish) act when I try to ask about it is confusing. I wish they would just say “I’m relying on the judgement of so-and-so” or something like that, as it would be much clearer. (Cf. https://www.lesswrong.com/posts/zLG2DnJw6oEZqAgaE/tools-for-deferring-gracefully )
I think this is a solid partial example of what I call confrontation-worthy empathy: https://www.youtube.com/watch?v=QYVOxn-Ndxw
I thought it was pretty clear why requirements to do more alignment work would be a lot easier and more attainable ask than a pause.
Well, Evan wrote:
I think this is much better than other slowdown proposals I’ve seen, since I think it’s very concrete, verifiable, and enforceable
It seems more attainable, sure, but also way way less useful, to the point where I don’t understand how it’s better than even something as sketchy as a napkin with “Advocate for a global stop on AGI research” written on it. Do you disagree?
We probably can’t get through a US government commitment to unilaterally pausing
I’m not especially advocating for that, though it might be good, IDK. I’m advocating for Anthropic to advocate for a global agreement.
I don’t know where anyone has tried to carefully make the case that a pause is impossible. It’s more that there’s an assumption that it’s quite hard, so the burden of proof falls more on those calling for one to propose a plan that could work. I don’t think a pause is impossible, although I think it’s unlikely to get one right now soon.
Do you agree that this assumption is a founding pillar of Anthropic’s strategy, at least as they present it? If this assumption is justifying doing frontier AGI research, should it be, like, argued somewhere? Don’t you think it’s weird that Anthropic as a whole big entity seems to be acting on this assumption, without a public argument for it? Do you think there’s no good argument for it, or do you think that it’s all in private somehow?
BTW, do you work at Anthropic?
Feasible overall, so including political feasibility.
Surely you could give an elaborate answer yourself.
I really genuinely couldn’t!
I can list reasons that it’s difficult, and I can list reasons that it’s quite difficult in the long run, assuming compute isn’t a big bottleneck. But I don’t believe that the assumption of long-run, non-compute-bottlenecked research being the main driver of AGI progress is an assumption held by Evan or by Anthropic. I could be incorrect about this, happy to be corrected.
I don’t know it to be infeasible to globally prevent AGI research that involves significant resources. Do you know this? I’m genuinely asking. I’m not aware of any serious case laid out for this. That’s my ignorance, someone could just link such to me! (I did run an AI search for such, without relevant results.)
Like, in my ignorance, as far as I’ve personally seen / tracked, Anthropic seems be almost entirely, though not entirely, messaging on the presumption that a global stop is infeasible; and they use that presumption as justification for leading frontier AGI research including RSI. Is this presumption seriously defended anywhere?
This argument is useful because for example it could point at ways we could improve our chances at succeeding.
I agree it’s a weak argument for actually not doing this sort of advocacy. I’m in favor of this sort of advocacy. The argument does make an argument for this though: it says that the movement will likely be coopted, leading to outcomes that are worse than nothing by our lights, for example by making AGI research companies more entrenched + powerful, or by putting China in the sole lead of AGI research, or other such outcomes.