What an actually pessimistic containment strategy looks like
Israel as a nation state has an ongoing national security issue involving Iran.
For the last twenty years or so, Iran has been covertly developing nuclear weapons. Iran is a country with a very low opinion of Israel and is generally diplomatically opposed to its existence. Their supreme leader has a habit of saying things like “Israel is a cancerous tumor of a state” that should be “removed from the region”. Because of these and other reasons, Israel has assessed, however accurately, that if Iran successfully develops nuclear weapons, it stands a not-insignificant chance of using them against Israel.
Israel’s response to this problem has been multi-pronged. Making defense systems that could potentially defeat Iranian nuclear weapons is an important component of their strategy. The country has developed a sophisticated array of missile interception systems like the Iron Dome. Some people even suggest that these systems would be effective against much of the incoming rain of hellfire from an Iranian nuclear state.
But Israel’s current evaluation of the “nuclear defense problem” is pretty pessimistic. Defense isn’t all it has done. Given the size of Israel as a landmass, it would be safe to say that it’s probably not the most important component of Israel’s strategy. It has also tried to delay, or pressure Iran into delaying, its nuclear efforts through other means. For example, it gets its allies to sanction Iran, sabotages its facilities, and tries to convince its nuclear researchers to defect.
In my model, an argument like “well, what’s the point of all this effort, Iran is going to develop nuclear weapons eventually anyways” would not be very satisfying to Israeli military strategists. Firstly, that the Iranians will “eventually” get nuclear weapons is not guaranteed. Secondly, conditional on them doing it, it’s not guaranteed it’ll happen the expected lifetime of the people currently living in Israel, which is a personal win for the people in charge.
Thirdly, even if it’s going to happen tomorrow, every day that Iran does not possess nuclear weapons under this paradigm is a gift. Delaying a hypothetical nuclear holocaust means increasing the life expectancy of every living Israeli.
An argument like “well, what if you actually radicalize the Iranians into hardening their stance on developing nuclear weapons through all of this discouragement” might be pragmatic. But disincentivizing, dissuading, and sabotaging people’s progress toward things generally does what it says on the tin, and Iran is already doing nuclear weapons development. Any “intervention” you can come up with towards Iranian nuclear researchers is probably liable to make things better and not worse. Speaking more generally, there is still an instrumental motivation to get Iran to stop their nuclear weapons program, even if a diplomatic strategy would serve their needs better. Israel’s sub-goal of mulliganing their timeline away from a nuclear Iran is probably reasonable.
There are many people on this website that believe the development of AGI, by anyone in the world, would be much worse in expectation than Iran developing nuclear weapons, even from the perspective of a fiercely anti-Iranian nationalist. There are also some people on this website who additionally believe there is little to no hope for existing AI safety efforts to result in success. Since so far it doesn’t seem like there are any good reasons to believe that it’s harder and more genius-intense to develop nuclear weapons than it is to develop AGI, one might naively assume that these people would be open to a strategy like “get existing top AGI researchers to stop”. After all, that method has had some degree of success with regard to nuclear nonproliferation, and every hour that the catastrophic AGI extinction event doesn’t happen is an hour that billions of people get to continue to live. One would think that this opens up the possibility, and even suggests the strategy, of finding a way to reach and convince the people actually doing the burning of the AGI development commons.
So imagine my surprise when I informally learn that this sort of thinking is quasi-taboo. That people who wholesale devote their entire lives to the cause of preventing an AI catastrophe do not spend much of their time developing outreach programs or supporting nonviolent resistance directed toward DeepMind researchers. That essentially, they’d rather, from their perspective, literally lay down and die without having mounted this sort of direct action.
I find this perspective limiting and self-destructive. The broader goal of alignment, the underlying core goal, is to prevent or delay a global AGI holocaust, not to come up with a complete mathematical model of agents. Neglecting strategies that affect AGI timelines is limiting yourself to the minigame. The researchers at DeepMind ought to be dissuaded or discouraged from continuing to kill everybody, in addition to and in conjuction with efforts to align AI. And the more pessimistic you are about aligning AI, the more opposed you should be to AGI development, the more you should be spending your time figuring out ways to slow it down.
It seems weird and a little bit of a Chesterton’s fence to me that I’m the first person I know of to broach the subject on LessWrong with a post. I think an important reason is that people think these sorts of strategies are infeasible or too risky, which I strongly disagree is the case. To guard against this, I would now like to give an example of such an intervention that I did myself. This way I can provide a specific scenario for people in the comments section to critique instead of whatever strawman people might associate with “direct action”.
EleutherAI is a nonprofit AI capabilities research collective. Their main goal up until now has been to release large language models like the kind that OpenAI has but keeps proprietary. As a side project they occasionally publish capability research on these large language models. They are essentially a “more open” OpenAI, and while they’re smaller and less capable I think most people here would agree that their strategy and behavior before 2022, as opposed to stated goals, were probably more damaging than even OpenAI from an AI alignment perspective.
Interestingly, most of the people involved in this project were not unaware of the concerns surrounding AGI research; in fact they agreed with them! When I entered their discord, I found it counterintuitive that a large portion of their conversations seemed dedicated to rationalist memes, given the modus operandi of the organization. They simply learned not to internalize themselves as doing bad things, for reasons many reading probably understand.
Some people here are nodding their heads grimly; I had not yet discovered this harrowing fact about a lot of ML researchers who are told about the alignment problem. So one day I went into the #ai-alignment (!) discord channel inside the discord server where their members coordinate and said something like:
lc: I don’t think anybody here actually believes AGI is going to end the world. I find it weird that you guys seem to be fully on the LessWrong/rationalist “AGI bad” train and yet you cofounded an AI capabilities collective. Doesn’t that seem really bad? Aren’t you guys speeding up the death of everybody on the planet?
They gave me a standard post they use as a response. I told them I’d already read the post and that it didn’t make any sense. I explained the whole game surrounding timelines and keeping the universe alive a little bit longer than it otherwise would be. I then had a very polite argument with Leo Gao and a couple other people from the team for an hour or so. By the end some members of the team had made some pretty sincere seeming admissions that the Rotary Embeddings blog-post I linked earlier was bad, and some team members personally admitted to having a maybe-unhealthy interest in publishing cool stuff, no matter how dangerous.
I have no idea if the conversation actually helped long term, but my sense is that it did. Shortly thereafter they took a bunch of actions they alluded to in the blog post, like attempting to use these large language models for actual alignment research instead of just saying that what they were doing was OK because somebody else might after they open sourced them. I also sometimes worry whether or not the research they were doing ever consequented in faster development of AGI in the first place, but an institution could have people to assess things like that. An institution could do A/B testing on interventions like these. It can talk to people more than once. With enough resources it can even help people (who may legitimately not know what else they can work on) find alternative career paths.
With these kinds of efforts, instead of telling people who might already be working in some benign branch of ML that there’s this huge problem with AGI, who can potentially defect and go into that branch because it sounds cool, you’re already talking to people who, from your perspective, are doing the worst thing in the world. There’s no failure mode where some psychopaths are going to go be intrigued by the “power” of turning the world into paperclips. They’re already working at DeepMind or OpenAI. Personally, I think that failure mode is overblown, but this is one way you get around it.
I don’t have the gumption to create an institution like this from scratch. But if any potential alignment researchers or people-who-would-want-to-be-alignment-researchers-but-aren’t-smart-enough are reading this, I’m begging you to please create one so I can give my marginal time to that. Using your talents to try to develop more math sounds to a lot of people like it might be a waste of effort. I know I’m asking a lot of you, but as far as I can tell, figuring out how to do this well seems like the best thing you can do.
Not all political activism has to be waving flags around and chanting chants. Sometimes activists actually have goals and then accomplish something. I think we should try to learn from those people, as lowly as your opinion might be of them, if we don’t seem to have many other options.
- Let’s think about slowing down AI by 22 Dec 2022 17:40 UTC; 492 points) (
- Let’s think about slowing down AI by 23 Dec 2022 19:56 UTC; 320 points) (EA Forum;
- Reshaping the AI Industry by 29 May 2022 22:54 UTC; 143 points) (
- Three Reflections from 101 EA Global Conversations by 25 Apr 2022 22:02 UTC; 127 points) (EA Forum;
- The case for Doing Something Else (if Alignment is doomed) by 5 Apr 2022 17:52 UTC; 88 points) (
- Katja Grace: Let’s think about slowing down AI by 23 Dec 2022 0:57 UTC; 80 points) (EA Forum;
- The History, Epistemology and Strategy of Technological Restraint, and lessons for AI (short essay) by 10 Aug 2022 11:00 UTC; 75 points) (EA Forum;
- Quick Thoughts on A.I. Governance by 30 Apr 2022 14:49 UTC; 68 points) (
- Quick Thoughts on A.I. Governance by 30 Apr 2022 14:49 UTC; 43 points) (EA Forum;
- 23 Jul 2022 19:04 UTC; 33 points)'s comment on Connor Leahy on Dying with Dignity, EleutherAI and Conjecture by (
- A bridge to Dath Ilan? Improved governance on the critical path to AI alignment. by 18 May 2022 15:51 UTC; 23 points) (
- 6 Apr 2022 4:05 UTC; 22 points)'s comment on Announcing the Future Fund by (EA Forum;
- [Fiction] Improved Governance on the Critical Path to AI Alignment by 2045. by 18 May 2022 15:50 UTC; 20 points) (EA Forum;
- Pop Culture Alignment Research and Taxes by 16 Apr 2022 15:45 UTC; 16 points) (
- Godshatter Versus Legibility: A Fundamentally Different Approach To AI Alignment by 9 Apr 2022 21:43 UTC; 12 points) (
- Strategies for keeping AIs narrow in the short term by 9 Apr 2022 16:42 UTC; 9 points) (
- 6 Apr 2022 7:51 UTC; 9 points)'s comment on The case for Doing Something Else (if Alignment is doomed) by (
- Any further work on AI Safety Success Stories? by 2 Oct 2022 9:53 UTC; 7 points) (
- 30 May 2022 20:47 UTC; 7 points)'s comment on Reshaping the AI Industry by (
- 20 Dec 2022 5:33 UTC; 6 points)'s comment on Why mechanistic interpretability does not and cannot contribute to long-term AGI safety (from messages with a friend) by (EA Forum;
- 5 Jun 2022 3:49 UTC; 6 points)'s comment on Confused why a “capabilities research is good for alignment progress” position isn’t discussed more by (
- Any further work on AI Safety Success Stories? by 2 Oct 2022 11:59 UTC; 2 points) (EA Forum;
- 9 Apr 2022 13:53 UTC; 2 points)'s comment on Convincing All Capability Researchers by (
- 15 May 2022 23:48 UTC; 1 point)'s comment on Should we buy Google stock? by (
- Could Patent-Trolling delay AI timelines? by 10 Jun 2022 2:53 UTC; 1 point) (
- 7 Jun 2022 6:44 UTC; -2 points)'s comment on We will be around in 30 years by (
There’s a (hopefully obvious) failure mode where the AGI doomer walks up to the AI capabilities researcher and says “Screw you for hastening the apocalypse. You should join me in opposing knowledge and progress.” Then the AI capabilities researcher responds “No, screw you, and leave me alone”. Not only is this useless, but it’s strongly counterproductive: that researcher will now be far more inclined to ignore and reject future outreach efforts (“Oh, pfft, I’ve already heard the argument for that, it’s stupid”), even if those future outreach efforts are better.
So the first step to good outreach is not treating AI capabilities researchers as the enemy. We need to view them as our future allies, and gently win them over to our side by the force of good arguments that meets them where they’re at, in a spirit of pedagogy and truth-seeking.
(You can maybe be more direct with someone that they’re doing counterproductive capabilities research when they’re already sold on AGI doom. That’s probably why your conversation at EleutherAI discord went OK.)
(In addition to “it would be directly super-counterproductive”, a second-order reason not to try to sabotage AI capabilities research is that “the kind of people who are attracted to movements that involve sabotaging enemies” has essentially no overlap with “the kind of people who we want to be part of our movement to avoid AGI doom”, in my opinion.)
So I endorse “get existing top AGI researchers to stop” as a good thing in the sense that if I had a magic wand I might wish for it (at least until we make more progress on AGI safety). But that’s very different from thinking that people should go out and directly try to do that.
Instead, I think the best approach to “get existing top AGI researchers to stop” is producing good pedagogy, and engaging in gentle, good-faith arguments (as opposed to gotchas) when the subject comes up, and continuing to do the research that may lead to more crisp and rigorous arguments for why AGI doom is likely (if indeed it’s likely) (and note that there are reasonable people who have heard and parsed and engaged with all the arguments about AGI doom but still think the probability of doom is <10%).
I do a lot of that kind of activity myself (1,2,3,4,5, etc.).
The history of cryonics’ PR failure has something to teach here.
Dozens of deeply passionate and brilliant people all trying to make a case for something that in fact makes a lot of sense…
…resulted in it being seen as even more fringe and weird.
Which in turn resulted in those same pro-cryonics folk blaming “deathism” or “stupidity” or whatever.
Which reveals that they (the pro-cryonics folk) had not yet cleaned up their motivations. Being right and having a great but hopeless cause mattered more than achieving their stated goals.
I say this having been on the inside of this one for a while. I grew up in this climate.
I also say this with no sense of blame or condemnation. I’m just pointing out an error mode.
I think you’re gesturing at a related one here.
This is why I put inner work as a co-requisite (and usually a prerequisite) for doing worthwhile activism. Passion is an anti-helpful replacement for inner insight.
I think this is basically correct: if people don’t get right with their own intentions and motivations, it can sabotage their activism work.
Correct, and that’s why I took that approach.
To this effect I have advocated that we should call it “Different Altruism” instead of “Effective Altruism”, because by leading with the idea that a movement involves doing altruism better than status quo, we are going to trigger and alienate people part of status quo that we could have instead won over by being friendly and gentle.
I often imagine a world where we had ended up with a less aggressive and impolite name attached to our arguments. I mean, think about how virality works: making every single AI researcher even slightly more resistant to engaging your movement (by priming them to be defensive) is going to have massive impact on the probability of ever reaching critical mass.
It’s not taboo. I’ve been discussing whether we should do this with various people off and on for the past five years. People take these ideas seriously. Just because people don’t agree, or don’t take it seriously enough, doesn’t mean it’s taboo!
FWIW I think it’s a good idea too (even though for years I argued against it!). I think it should be done by a well-coordinated group of people who did lots of thinking and planning beforehand (and coordinating with the broader community probably) rather than by lone wolves (for unilateralist’s curse reasons.)
It seems “taboo” to me. Like, when I go to think about this, I feel … inhibited in some not-very-verbal, not-very-explicit way. Kinda like how I feel if I imagine asking an inane question of a stranger without a socially sensible excuse, or when a clerk asked me why I was buying so many canned goods very early in Covid.
I think we are partly seeing the echoes of a social flinch here, somehow. It bears examining!
Open tolerance of the people involved with status quo and fear of alienating / making enemies of powerful groups is a core part of current EA culture! Steve’s top comment on this post is an example of enforcing/reiterating this norm.
It’s an unwritten rule that seems very strongly enforced yet never really explicitly acknowledged, much less discussed. People were shadow blacklisted by CEA from the Covid documentary they funded for being too disrespectful in their speech re: how governments have handled covid. That fits what I’d consider a taboo, something any socially savvy person would pick up on and internalize if they were around it.
Maybe this norm for open tolerance is downstream of the implications of truly considering some people to be your adversaries (which you might do if you thought delaying AI development by even an hour was a considerable moral victory, as the OP seems to). Doing so does expose you to danger. I would point out that while lc’s post analogizes their relationship with AI researchers to Isreal’s relationship with Iran. When I think of Israel’s resistance to Iran nonviolence is not the first thing that comes to mind.
I agree. I also think this is a topic that needs to be seriously considered and discussed because not doing so may leave behind a hidden hindrance to accurate collective assessment and planning for AI risks. Because contrary to our conceits and aspirations, our judgements aren’t at all immune to the sway of biases, flawed assumptions, and human emotions. I’m not sure how to put this, but people on this forum don’t come off as very worldly, if that makes sense. A lot of people are in technical professions where understanding of political realities seem to be lacking. The US and China stand to be the two major drivers of AI development in the next decades. Increasingly they don’t see eye to eye, and an arm-race dynamic might develop. So I feel there’s been a lot of focus on the technical/theoretical side of things, but not enough concern over the practical side of development, the geopolitical implications, and all that might entail.
FYI, I thought this sort of idea was an obvious one, and I’ve been continuously surprised that it didn’t have more discussion. I don’t feel inhibited and am sort of surprised you are.
(I do think there’s a lot of ways to do this badly, with costs on the overall coordination-commons, so, maybe I feel somewhat inhibited from actually going off to do the thing. But I don’t feel inhibited from brainstorming potential ways to address the costs and thinking about how to do it)
(kinda intrigued by the notion of there being dark-matter taboos)
OK! Well, I can’t speak for everyone’s experiences, only my own. I don’t think this subject should be taboo and I’m glad people are talking more about it now.
I feel similarly.
I also find it somewhat taboo but not so much that I haven’t wondered about it.
You’re right that it’s not actually taboo. Shouldn’t have used that word. Just seemed so far like I got a lot of weird resistance when I brought it up.
My experience agrees with yours. I thought “taboo” was a bit strong, but I immediately got what you meant and nodded along.
Daniel, why did you argue against it and what changed your mind?
The main argument I made was: It’s already very hard to get governments and industry to meaningfully limit fossil fuel emissions, even though the relevant scientists (climatologists etc.) are near-unanimous about the long-term negative consequences. Imagine how much harder it would be if there wasn’t a separate field of climatology, and instead the only acknowledged experts on the long-term effects of fossil fuels were… petroleum industry engineers and executives! That’s the situation with AGI risk. We could create a separate field of AI risk studies, but even better would be to convince the people in the industry to take the risk seriously. Then our position would be *better* than the situation with climate change, not worse. How do we do this? Well, we do this by *not antagonizing the industry*. So don’t call for bans, at least not yet.
The two arguments that changed my mind:
(a) We are running out of time. My timelines dropped from median 2040ish to median 2030ish.
(b) I had assumed that banning or slowing down AGI research would be easier the closer we get to AGI, because people would “wake up” to the danger after seeing compelling demonstrations and warning shots etc. However I am now unsure; there’s plausibly a “whirlpool effect” where the closer you get to AGI the more everyone starts racing towards it and the harder it is to stop. Maybe the easiest time to ban it or slow it down is 10 years, even 20 years, before takeoff. (Compare to genetically engineered superbabies. Research into making them was restricted and slowed down decades before it became possible to do it, as far as I can tell.)
Ah, OK, that all sounds pretty sensible. I think ‘this sort of thinking’ is doing a lot of work here. I agree that sponsoring mobs to picket DeepMind HQ is just silly and will probably make things harder. I think buttonholing DeepMind people and government people and trying to convince them of the dangers of what they’re doing is something we should have been doing all along.
I got the impression that the late Dominic Cummings was on our side just organically without anyone needing to persuade him, in fact I seem to remember Boris Johnson saying something silly about terminators just before the covid kerfuffle broke out. It may not be too hard to convince people.
If at the very least we can get people to only do this sort of thing in secret government labs run by people who know perfectly well that they’re messing with world-ending weapons in defiance of international treaties that’s a start. Not that that will save us, but it might be slower than the current headlong rush to doom. If things go really well we might hang on long enough to experience grey goo or a deliberately engineered pandemic!
Half the problem is that the people who actually do AI research seem divided as to whether there’s any danger. If we can’t convince our own kind, convincing politicians and the like is going to be hard indeed.
All I’ve got is ill-formed intuitions. I read one science-fiction story about postage stamps ten years ago and became a doomer on the spot. I think maths and computer people are unusually easy to convince with clear true arguments and if we come up with some we might find getting most of the industry on side easier than everyone seems to think.
Have we tried to actually express our arguments in some convincing way? I’m thinking it’s not actually a very complicated argument, and most of the objections people come up with on the spot are easy enough to counter convincingly. Some sort of one-page main argument with an FAQ for frequently thought of objections might win most of the battle. I don’t suppose you happen to know of one already constructed do you?
As you point out and I agree, if we don’t win this battle then the world just suddenly ends in about ten years time so we should probably have a pop at easy routes to victory if they’re available and don’t have any obvious downsides.
Did he die? If so, it’s not in the news. (I mean, I did a quick search and didn’t find it.)
Oh god, sorry, I just can’t stop myself. I mean his political reputation is shredded beyond hope of repair. Loathed by the people of the UK in the same way that Tony Blair is, and seen as brilliant but disloyal in the same way that the guy in Mad Men is after he turns on the tobacco people.
We may be touching on the mind-killer here. Let us speak of such things no further.
Dominic Cummings lives, a prosperous gentleman.
Dominic Cummings is not dead, and I should remember that my ironic flourishes are likely to be taken literally because other people on the internet don’t have the shared context that I would have if I was sounding off in the pub.
Thanks for the clarification!
No, I think John is saying he died politically; that is, he no longer holds power. This is definitely overstated (he might get power in the future) and confusing.
Agree that this is definitely a plausible strategy, and that it doesn’t get anywhere near as much attention as it seemingly deserves, for reasons unknown to me. Strong upvote for the post, I want to see some serious discussion on this. Some preliminary thoughts:
How did we get here?
If I had to guess, the lack of discussion on this seems likely due to a founder effect. The people pulling the alarm in the early days of AGI safety concerns were disproportionately to the technical/philosophical side rather than to the policy/outreach/activism side.
In early days, focus on the technical problem makes sense. When you are the only person in the world working on AGI, all the delay in the world won’t help unless the alignment problem gets solved. But we are working at very different margins nowadays.
There’s also an obvious trap which makes motivated reasoning really easy. Often, the first thing that occurs when thinking about slowing down AGI development is sabotage—maybe because this feels urgent and drastic? It’s an obviously bad idea, and maybe that lets us to motivated stopping.
Maybe the “technical/policy” dichotomy is keeping us from thinking of obvious ways we could be making the future much safer? The outreach org you propose seems like not really either. Would be interested in brainstorming other major ways to affect the world, but not gonna do that in this comment.
HEY! FTX! OVER HERE!!
You should submit this to the Future Fund’s ideas competition, even though it’s technically closed. I’m really tempted to do it myself just to make sure it gets done, and very well might submit something in this vein once I’ve done a more detailed brainstorm.
Probably a good idea, though I’m less optimistic about the form being checked. I’ll plan on writing something up today. If I don’t end up doing that today for whatever reason, akrasia, whatever, I’ll DM you.
I am confident that they would not let it being technically closed stop them from considering the proposal if someone they respected pointed at the proposal (and likely even if they didn’t), and I’d be happy to do the pointing for you if necessary. If that seems necessary, DM me.
Please DM me if you end up starting some sort of project along these lines. I have some (admittedly limited) experience working in media/public relations, and can probably help a bit.
For anyone else also interested, you should add yourself on this spreadsheet. https://docs.google.com/spreadsheets/d/1WEsiHjTub9y28DLtGVeWNUyPO6tIm_75bMF1oeqpJpA/edit?usp=sharing
It’s very useful for people building such an organisation to know of interested people, and vice versa.
If you don’t want to use the spreadsheet, you can also DM me and I’ll keep you in the loop privately.
If you’re making such an organisation, please contact me. I’d like to work with you.
Also, the people pulling the alarm in the early days of AGI safety concerns, are also people interested in AGI. They find it cool. I get the impression that some of them think aligned people should also try to win the AGI race, so doing capabilities research and being willing to listen to alignment concerns is good. (I disagree with this position and I don’t think it’s a strawman, but it might be a bit unfair.)
Many of the people that got interested in AGI safety later on also find AGI cool, or have done some capabilities research (e.g. me), so thinking that what we’ve done is evil is counterintuitive.
Yeah most people in the AGI safety concern group are also technological progress enthusiasts (with good reason, technological progress is generally awesome). And the recent string of “alignement startup” fondations points toward an unhealthy mix of capacity and safety research.
One thing to consider: if we successfully dissuade deepmind researchers from working on AGI, who actually do take alignment issues a little bit seriously, does it instead get developed by meta researchers who (for the sake of argument) don’t care?
More generally you’re not going to successfully remove everyone from the field. So the people you’ll remove will be those who are most concerned about alignment, leaving those who are least concerned to discover AGI.
It’s certainly a consideration, and I don’t want us to convince anybody who is genuinely working on alignment at DeepMind to leave. On the other hand, I don’t think their positive affinity towards AGI alignment matters much if what they are doing is accelerating its production. This seems a little like saying “you shouldn’t get those Iranian nuclear researchers to quit, because you’re removing the ‘checks’ on the other more radical researchers”. It’s probably a little naive to assume they’re “checking” the other members of DeepMind if their name is on this blog post: https://www.deepmind.com/blog/generally-capable-agents-emerge-from-open-ended-play.
This is also probably ignoring the maybe more plausible intermediate success of outreach, which is to get someone who doesn’t already have concerns to have them, who then keeps their job because of inertia. We can still do a lot of work getting those who refuse to quit to help move organizations like OpenAI or DeepMind more toward actual alignment research, and create institutional failsafes.
I think a case could be made that if you don’t want Eleuther making X public, then it is also bad for DeepMind or OpenAI to make X.
I would make that case. In this circumstance it’s also definitely the far lesser evil.
Within the pessimistic hypothesis it does not matter who develops AGI, in any case our death is almost certain.
The goal here should be social consensus about individual disincentives to misaligned deployment—stopping individual research labs has a pretty modest ROI. (If they pivot their attention given social consensus, that’s up to them.)
Not that you said otherwise, but just to be clear: it is not the case that most capabilities researchers at DeepMind or OpenAI have similar beliefs as people at EleutherAI (that alignment is very important to work on). I would not expect it to go well if you said “it seems like you guys are speeding up the deaths of everyone on the planet” at DeepMind.
Obviously there are other possible strategies; I don’t mean to say that nothing like this could ever work.
Completely understood here. It’d be different for OpenAI, even more different for DeepMind. We’d have to tailor outreach. But I would like to try experimentation.
We can’t take this for granted: when A tells B that B’s views are inconsistent, the standard response (afaict) is for B to default in one direction (and which direction is often heavily influenced by their status quo), make that direction their consistent view, and then double down every time they’re pressed.
It’s possible that we have ~1 shot per person at convincing them.
I’ve found over the years that people only ever get really angry at you for saying curious things if they think those ideas might be true. Maybe we’re already half-way there at DeepMind!
Thanks a lot for doing this and posting about your experience. I definitely think that nonviolent resistance is a weirdly neglected approach. “mainstream” EA certainly seems against it. I am glad you are getting results and not even that surprised.
You may be interested in discussion here, I made a similar post after meeting yet another AI capabilities researcher at FTX’s EA Fellowship (she was a guest, not a fellow): https://forum.effectivealtruism.org/posts/qjsWZJWcvj3ug5Xja/agrippa-s-shortform?commentId=SP7AQahEpy2PBr4XS
Are you aware of Effective Altruism’s AI governance branch? I didn’t look into it in detail myself, but there are definitely dozens of people already working on outreach strategies that they believe to be the most effective. FHI, CSER, AI-FAR, GovAI, and undoubtedly more groups have projects ongoing for outreach, political intervention, etc. with regards to AI Safety. If you want to spend your marginal time on stuff like this, contact them.
It does appear true that the lesswrong/rationalist community is less engaged with this strategy than might be wise, but I’m curious if those organisations would say if people currently working on technical alignment research should switch to governance/activism, and what their opinion is on activism. 80,000 hours places AI technical research above AI governance in their career impact stack, though personal fit plays a major part.
I was not aware. Are these outreach strategies towards the general public with the aim of getting uninvolved people to support AI alignment efforts, or are they toward DeepMind employees to get them to stop working so hard on AGI? I know there are lots of people raising awareness in general, but that’s not really the goal of the strategy that I’ve outlined.
Much of the outreach efforts are towards governments, and some to AI labs, not to the general public.
I think that because of the way crisis governance often works, if you’re the designated expert in a position to provide options to a government when something’s clearly going wrong, you can get buy in for very drastic actions (see e.g. COVID lockdowns). So the plan is partly to become the designated experts.
I can imagine (not sure if this is true) that even though an ‘all of the above’ strategy like you suggest seems like on paper it would be the most likely to produce success, you’d get less buy in from government decision-makers and be less trusted by them in a real emergency if you’d previously being causing trouble with grassroots advocacy. So maybe that’s why it’s not been explored much.
This post by David Manheim does a good job of explaining how to think about governance interventions, depending on different possibilities for how hard alignment turns out to be: https://www.lesswrong.com/posts/xxMYFKLqiBJZRNoPj/
In case of interest, I’ve been conducting AI strategy research with CSER’s AI-FAR group, amongst others a project to survey historical cases of (unilaterally decided; coordinated; or externally imposed) technological restraint/delay, and their lessons for AGI strategy (in terms of differential technological development, or ‘containment’).
(see longlist of candidate case studies, including a [subjective] assessment of the strength of restraint, and the transferability to the AGI case)
This is still in-progress work, but will be developed into a paper / post within the next month or so.
One avenue that I’ve recently gotten interested in, though I’ve only just gotten to read about it and have large uncertainties about it, is the phenomenon of ‘hardware lotteries’ in the historical development of machine learning—see https://arxiv.org/abs/2009.06489 -- to describe cases were the development of particular types of domain specialized compute hardware make it more costly [especially for e.g. academic researchers, probably less so for private labs] to pursue particular new research directions.
These are really interesting, thanks for sharing!
It strikes me that 80000 hours puts you just about when the prediction markets are predicting AGI to be available, i.e., a bit late. I wonder if EA folks still think government roles are the best way to go?
Am I the only one who thinks that the world as it is is unbelievably, fantastically, super bad? The concept of an AI destroying the world would only be bad because it would prevent a very good potential future from coming into existence, that could not otherwise happen. Stopping the AI would remove all hope of it ever happening.
I’m happy most days.
Not speaking for Flaglandbase, but I’d argue the world right now (or rather, life on earth) is super bad because it’s dominated by animal suffering. I’m also happy most days.
I agree with this, and the overall history of the world is definitely on balance extreme suffering.
For farmed animals in particular, we don’t need AGI to end their plight. Just regular economic growth and advocacy will do.
Also, given how much time we’ve been suffering already, and how much is at stake; would it be so bad to delay AGI by 100 or 200 years? We can do a lot of alignment research in that time.
Yeah, if I got to decide, I would barely factor in how bad the world is right now. Delay AGI until it’s outweighed by other x-risks.
Depends how much you value suffering vs pleasure I guess. If you think it’s better to exist, and experience positive things at the cost of great suffering, then this world is pretty awesome. If you would rather not exist (or more accurately believe that most beings would rather not exist if they had the choice), then things look pretty bad…
I agree the world right now is super bad. However, “delay AGI until we really know what we’re doing” doesn’t seem all that much harder than “delay AGI forever”, and most people do agree that alignment is solvable. Right now, we still have far more people working on capability (and they’ve been working on it for far longer); if we could change this, alignment may even get solved relatively quickly.
Eliezer has said this explicitly, e.g. on the sam harris podcast (CTRL+F “alignment is impossible”)
I predict that most creatures disagree with you, if an honest poll about themselves was done, and not about some far abstraction of other people. (EDIT: Link is about humans, but I predict most non-humans also prefer to be alive and aren’t better off dead.)
Which is also my prior on the attitude of “it’s fine if everyone dies” people. Of historical cases where someone thought that, few people agreed and we end up glad they didn’t get their way. I’m sure it’s the same all over again here with you, and some other people I’ve heard express this attitude.
You say “creatures”, but the linked source seems to be only about humans.
Yes, but I predict it will end up applying to most non-humans too.
Why would you be able to generalize from humans to factory farmed animals?
Most animals aren’t in a factory farm.
I think this post makes sense given the premises/arguments that I think many people here accept: that AG(S)I is either amazingly good or amazingly bad, and that getting the good outcome is a priori vastly improbable, and that the work needed to close the gap between that prior and a good posterior is not being done nearly fast enough.
I don’t reject those premises/arguments out of hand, but I definitely don’t think they’re nearly as solid as I think many here do. In my opinion, the variance in goodness of reasonably-thinkable post-AGSI futures is mind-bogglingly large, but it’s still probably a bell curve, with greater probability density in the “middle” than in super-heaven or ultra-hell. I also think that just making the world a better place here and now probably usually helps with alignment.
This is probably not the place for debating these premises/arguments; they’re the background of this post, not its point. But I do want to say that having a different view on that background is (at least potentially) a valid reason for not buying into the “containment” strategy suggested here.
Again, I think my point here is worthwhile to mention as one part of the answer to the post’s question “why don’t more people think in terms of containment”. I don’t think that we’re going to resolve whether there’s space in between “friendly” and “unfriendly” right here, though.
We might. High dimensional space, tiny target area for anything particularly emotionally salient. Like finding a good book in the Library of Babel. Mostly the universe gets turned into random rubbish.
If this is saying that there’s no plausible downside, that statement seems incorrect. It’s not a very important bit whether or not someone has a narrative of “working on AGI”. It takes 2 minutes to put that on a resume, and it would even be defensible if you’d ever done anything with a neural net. More important is the organizing principles of their technical explorations. There’s a whole space of possible organizing principles. If you’re publicizing your AI capabilities insights, the organizing principle of “tweak and mix algorithms to get gains on benchmarks” is less burning the AGI fuse than the organizing principle of “try new algorithms” is less burning the AGI fuse than the organizing principle of “try to make AGI”. To argue that AGI kills you by default, you argue that there are algorithmic inventions to be found that generalize across domains to enable large capability jumps, without having a controllable / understood relationship to goals. Which a fortiori says something about the power of generalization. Which might change how people organize their research. If “X is possible” can be the main insight of a research breakthrough, then “AGI is dangerous” contains “AGI is powerful” contains “power is possible”, and could have similar effects.
On another note:
There’s other downsides here. I don’t think that means this class of strategy should be taboo, and I think this class of strategy absolutely should be worked on. But pursuing a strategy without noticing the downsides, and noticing the ways the strategy is doomed to not actually help at all, is pretty crucial.
Downside: depending on implementation, you turn yourself into an ideological repeater. (This means you probably end up talking to people selected for, at least while they’re talking to you, being themselves ideological repeaters, which makes your strategy useless.) So you cause top AGI researchers to be more in an environment filled with ideological repeaters. So you cause it to be correct for top AGI researchers to model people around them as not having coherent perspectives, but instead to be engaging in ideological power struggles. If top AGI researchers are treating people that way, that bodes poorly for mind-changing communication.
Bruce Bueno de Mesquita’s game theoretic computer models predict that this is what happens for the sanctions in Iran. They destroy the domestic opposition to developing nuclear weapons.
Going a bit meta, the fact that this post seems to receive majority agreement makes me question the degree to which the consensus against it is real (as supposed to a signaling equilibrium). And I also want to mention that I figured from the title that this was about AI boxing, and it’s possible that others haven’t clicked on it because AI boxing doesn’t seem that interesting.
I’m shit at titles. How do I do the title without giving away the punchline?
It’s a great title for what the post says! And it drew my interest immediately.
If what you want is a clickbait title, “Break Glass in Case of Doom”, “10 reasons why doom is not inevitable”, “The Doom-Averting Secret that Big AI Alignment Doesn’t Want You to Know About”, etc… Although those are all old-people clickbait. Modern clickbait probably works differently: “This fashionable teenage influencer wants to use the power of TikTok to save the world?!”
I would have probably called it something like “The Case for the Social Approach to AI Safety”, which would give away the punchline. Maybe “What can actually/still/ be done do if alignment is doomed”
What do you think about offering an option to divest from companies developing unsafe AGI? For example, by creating something like an ESG index that would deliberately exclude AGI-developing companies (Meta, Google etc) or just excluding these companies from existing ESGs.
The impact = making AGI research a liability (being AGI-unsafe costs money) + raising awareness in general (everyone will see AGI-safe & AGI-unsafe options in their pension investment menu + a decision itself will make a noise) + social pressure on AGI researchers (equating them to fossil fuels extracting guys).
Do you think this is implementable short-term? Is there a shortcut from this post to whoever makes a decisions at BlackRock & Co?
At first my reaction was something like, “the teams have been acquired by large trillion dollar technology companies and so a dollar moved away from those companies is probably a bit less than a penny moved away from AGI development. This sounds very inefficient.”
But as a publicly announced way to incentivize defunding DeepMind it’s at least theoretically very efficient. If I controlled BlackRock I could say “I will divest x$ from Google as a whole unless you divest x/y$ from DeepMind’s meta research toward legitimate AI safety” and it would be pretty strongly in Google’s interest to do so. The difficulties lie in all of the details—you’d want to make the campaign extremely boring except for the people you care about , evaluate leadership of Google/Facebook/Microsoft to see how they’d react, coordinate the funds so that the shareholder activists have clear and almost auto-enforced terms, etc. The failure mode for this ironically looks something like how the U.S. does its sanctions, where we do a very poor job of dictating clear terms and goals, and increasing or decreasing pressure quickly and transparently in response to escalation and de-escalation.
We also really wouldn’t want these strategies to backfire by giving any fired meta researchers a reason to hate us personally and be less sympathetic. Finding some way to “cancel” AGI researchers would honestly feel really good to me but even under the best circumstances it’d be really ineffective. We don’t want them disgruntled and working somewhere else on the same thing, I want them to have something else to do that doesn’t lead to the collapse of the lightcone.
A suggestion from my brother: https://forum.effectivealtruism.org/posts/KigFfo4TN7jZTcqNH/the-future-fund-s-project-ideas-competition?commentId=hRJxxhKtbKj8fhDd5
I’m not persuaded by this argument because developing AGI is not purely a negative. AGI could either have a massive positive result (an aligned AI) or a massive negative result (an unaligned AI). Because of winner-takes-all effects, it’s likely that whichever AI is developed first will determine the overall outcome. So one way to decrease the chance of an unfriendly AI is to stop all AI research and another way to do this is to develop a friendly AI first.
We know that multiple agencies worldwide are working on developing AI. Moreover, it’s might be that the advancement of raw computing power will mean that eventually any competent person will be able to create an AGI, at which point it is impossible to directly stop the development of an AGI. So either we need to shut down the development of computing hardware and shut down all AI laboratories globally, or we need to develop a friendly AI first.
Doing the first requires coordinating multiple opposed countries to do something that none of them naively want to do for a reason that will be very difficult to convince politicians of (look at the disagreements even among AI researchers!). This is the climate change problem but on hardcore mode. It is possible to work hard and delay this some, especially since many AI companies are in the Silicon Valley bubble, but it is probably not possible to halt it. We can buy years and maybe decades, but not centuries. The only way to buy centuries is to develop friendly AI first.
We certainly don’t know how to do that yet, but it’s much more likely to be done by people who care about AI safety than to be done by people who don’t care about AI safety.
The consensus among alignment researchers is that if AGI were developed right now it would be almost certainly a negative. We simply don’t know how one would ensure a superintelligence was benevolent yet, even theoretically. The argument is more convincing if you agree with that assessment, because the only way to get benevolent AI becomes to either delay the creation of AGI until we do have that understanding or hope that the understanding arrives in time.
The argument also becomes more convincing if you agree with the assessment that advancements toward AGI aren’t going to be driven mostly by moore’s law and is instead going to be concentrated in a few top research and development companies—DeepMind, Facebook AI labs, etc. That’s my opinion and it’s also one I think is quite reasonable. Moore’s law is slowing down. It’s impossible for someone like me to predict how exactly AGI will be developed, but when I look at the advancements in AGI-adjacent-capabilities-research in the last ten years, it seems like the big wins have been in research and willingness to spend from the big players, not increased GPU power. It’s not like we know of some algorithm right now which we just need 3 OOMs more compute for, that would give us AGI. The exception to that would maybe be full brain emulation, which obviously comes with reduced risk.
We also don’t know what an actual superintelligence would look like; it could be that the lack of alignment understanding is an inevitable consequence of our capabilities understanding not being there yet.
For the nuclear analogy, you couldn’t design a safe nuclear power plant before you understood what nuclear fission and radioactivity were in the first place. As an another example, InstructGPT is arguably a more “aligned” version of GPT-3, but it seems unlikely that anyone could have figured out how to better align language models before language models were invented.
Could you say more about this hypothesis? To me, it feels likely that you can get crazy capabilities from a black box that you don’t understand and so whose behavior/properties you can’t verify to be acceptable. It’s not like once we build a deceptive model we will know what deceptive computation looks like and how to disincentivize it (which is one way your nuclear analogy could translate).
It’s possible, also, that this is about takeoff speeds, and that you think its plausible that e.g. we can disincentivize deception by punishing the negative consequences it entails (if FOOM, can’t since we’d be dead).
Or maybe once our understanding of intelligent computation in general improves, it will also give us the tools for better identifying deceptive computation.
E.g. language models are already “deceptive” in a sense—asked something that it has no clue about, InstructGPT will happily come up with confident-sounding nonsense. When I shared that, multiple people pointed out that its answers sound like the kind of a student who’s taking an exam and is asked to write an essay about a topic they know nothing about, but they try to fake their way through anyway (that is, they are trying to deceive the examiner). Thus, even if you are doing pure capabilities research and just want your AI system to deliver people accurate answers, it is already the case that you can see a system like InstructGPT “trying to deceive” people. If you are building a question-answering system, you want to build one that people can trust to give accurate answers rather than impressive-sounding bullshit, so you have the incentive to work on identifying and stopping such “deceptive” computations as a capabilities researcher already.
This means that the existence of InstructGPT gives you both 1) a concrete financial incentive to do research for identifying and stopping deceptive computation 2) a real system that actually carries out something like deceptive computation, which you can experiment on and whose behavior you can make use of in trying to understand the phenomenon better. That second point is something that wouldn’t have been the case before our capabilities got to this point. And it might allow us to figure out something we wouldn’t have thought of before we had a system with this capability level to tinker with.
[ETA: I’m not that sure of the below argument]
Thanks for the example, but it still seems to me that this sort of thing won’t work for advanced AI. If you are familiar with the ELK report, you should be able to see why. [Spoiler below]
Even if you manage to learn the properties of what looks like deception to humans, and instill those properties into a loss function, then it seems like you are still more likely to get a system that tells you what humans think the truth is, avoiding what humans would be able to notice as deception, rather than telling you what the truth actually seems to be (given what it knows). The reason is that, as AI develops, programs that are capable of the former thing have constant complexity, but programs that are capable of the latter thing have complexity that grows with the complexity of the AI’s models of the world, and so you should expect that the former is favored by SGD. See this part of the ELK document for a more detailed description of this failure mode.
What sort of thing? I didn’t mean to propose any particular strategy for dealing with deception, I just meant to say that now OpenAI has 1) a reason to figure out deception and 2) a concrete instance of it that they can reason about and experiment with and which might help them better understand exactly what’s going on with it.
More generally, the whole possibility that I was describing was that it might be impossible for us to currently figure out the right strategy since we are missing some crucial piece of understanding. If I could give you an example of some particularly plausible-sounding strategy, then that strategy wouldn’t have been impossible to figure out with our current understanding, and I’d be undermining my whole argument. :-)
Rather, my example was meant to demonstrate that it has already happened that
Progress in capabilities research gives us a new concrete example of how e.g. deception manifests in practice, that can be used to develop our understanding of it and develop new ideas for dealing with it.
Capabilities research reaches a point where even capabilities researchers have a natural reason to care about alignment, reducing the difference between “capabilities research” and “alignment research”.
Thus, our understanding and awareness of deception is likely to improve as we get closer to AGI, and by that time we will have already learned a lot about how deception manifests in simpler systems and how to deal with it, and maybe some of that will suggest principles that generalize to more powerful systems as well (even if a lot of it won’t).
It’s not that I’d put a particularly high probability on InstructGPT by itself leading to any important insights about either deception in particular or alignment in general. I-GPT is just an instance of something that seems likely to help us understand deception a little bit better. And given that, it seems reasonable to expect that further capabilities development will also give us small insights to various alignment-related questions, and maybe all those small insights will combine to give us the answers we need.
I mean to argue against your meta-strategy which relies on obtaining relevant understanding about deception or alignment as we get larger models and see how they work. I agree that we will obtain some understanding, but it seems like we shouldn’t expect that understanding to be very close to sufficient for making AI go well (see my previous argument), and hence not a very promising meta-strategy.
I read your previous comment as suggesting that the improved understanding would mainly be used for pursuing a specific strategy for dealing with deception, namely “to learn the properties of what looks like deception to humans, and instill those properties into a loss function”. And it seemed to me that the problem you raised was specific to that particular strategy for dealing with deception, as opposed to something else that we might come up with?
This isn’t true. [ETA: I linked the wrong survey before.]
I don’t see anything in the linked survey about a consensus view on total existential risk probability from AGI. The survey asked researchers to compare between different existential catastrophe scenarios, not about their total x-risk probability, and surely not about the probability of x-risk if AGI were developed now without further alignment research.
Maybe Carl meant to link this one
You’re right, my link was wrong, that one is a fine link.
You’re right, I linked the wrong survey!
I notice you didn’t mention EleutherAI.
Is that true? I thought that I had read Yudkowsky estimating that the probability of an AGI being unfriendly was 30% and that he was working to bring that 30% to 0%. If alignment researchers are convinced that this is more like 90+%, I agree that the argument becomes much more convincing.
I agree that these two questions are the cruxes in our positions.
That’s not Yudkowsky’s current position. https://www.lesswrong.com/posts/j9Q8bRmwCgXRYAgcJ/miri-announces-new-death-with-dignity-strategy describes the current view and in the comments, you see the views of other people at MIRI.
Yudkoskwy is at 99+% that AGI right now would kill humanity.
Also, look at his bet with Bryan Caplan. He’s not joking.
And, also, Jesus, Everyone! Gradient Descent, is just, like, a deadly architecture. When I think about current architectures, they make Azathoth look smart and cuddly. There’s nothing friendly in there, even if we can get cool stuff out right now.
I don’t even know anymore what it is like to not see it this way. Does anyone have a good defense that current ML techniques can be stopped from having a deadly range of action?
Probably not; Eliezer addressed this in Q6 of the post, and while it’s a little ambiguous, I think Eliezer’s interactions with people who overwhelmingly took it seriously basically prove that it was serious; see in particular this interaction.
(But can we not downvote everyone into oblivion just for drawing the obvious conclusion without checking?)
I first heard Eliezer describe “dying with dignity” as a strategy in October 2021. I’m pretty sure he really means it.
I am not sure if he’s given another number explicitly, but I’m almost positive that Yudkowsky does not believe that. The probability that an AGI will be end up being aligned “by default” is epsilon. Maybe he said at one point that there was a 30% chance that AGI will be what destroys the world if it’s developed, given alignment efforts, but that doesn’t sound to me like him either.
You should read the most recent post he made on the subject; it’s extraordinarily pessimistic about our future. He mentions multiple times that he thinks the probability of success here need to be measured in log-odds. He very sarcastically uses april fools at the end as a sort of ambiguity shield, but I don’t think anybody believes he isn’t being serious.
I’m not convinced that the odds mentioned in that post are meant to be taken literally, given it being an April Fools post, as opposed to just metaphorically and pointed in a direction.
He does also mention in that post that in the past he thought the odds were 50%, so perhaps I’m just remembering an old post from sometime between the 50% days and the epsilon days.
The most optimistic view I’ve heard recently is Vanessa Kosoy claiming 30% chance of pulling it off. Not sure where consensus would be, but I read MIRI as ‘almost certain doom’. And I can’t speak for Eliezer, but if he ever thought that there was ever any hope that AGI might be aligned ‘by chance’, that thought is well concealed in everything he’s written for the last 15 years.
What he did once think was that it might be possible, with heroic effort, to solve the alignment problem.
There is no reason why my personal opinion should matter to you, but it is: “We are fucked beyond hope. There is no way out. The only question is when.”
I don’t know what his earliest writing may have said, but his writing in the past few years has definitely not assigned anywhere near as high a probability as 70% to friendly AI.
Even if he had, and it was true, do you think a 30% chance of killing every human in existence (and possibly all life in the future universe) is in any way a sane risk to take? Is it even sane at 1%?
I personally don’t think advancing a course of action that has even an estimated 1% chance of permanent extinction is sane. While I have been interested in artificial intelligence for decades and even started my PhD study in the field, I left it long ago and have quite deliberately not attempted to advance it in any way. If I could plausibly hinder further research, I would.
Even alignment research seems akin to theorizing a complicated way of poking a sleeping dragon-god prophesied to eat the world, in such a manner that it will wake up friendly instead. Rather than just not poking it at all and making sure that nobody else does either, regardless of how tempting the wealth in its hoard might be.
Even many of the comparatively good outcomes in which superintelligent AI faithfully serves human goals seem likely to be terrible in practice.
It’s worth it to poke the dragon with a stick if you have only a 28% chance of making it destroy the world while the person who’s planning to poke it tomorrow has a 30% chance. If we can prevent those people in a different way then great, but I’m not convinced that we can.
It doesn’t help at all in the case where the research you’re doing makes it significantly more likely that they will be equipped with stronger sticks and have greater confidence in poking the dragon tomorrow.
I don’t want to be provocative, but if there was political will to stop AGI research it could probably be stalled for a long time. In order to get that political will, not only in the West but in China as well, a pretty effective way to do it might be figure out a way to use a pre-AGI model to cause mayhem/harm that’s bad enough to get the world’s attention, while not being apocalyptic.
As a random example, if AI is used somehow to take down the internet for a few days, the discourse and political urgency regarding AGI would change drastically. A close analogue is how quickly the world started caring about Gain-of-function-research after Covid.
I fear you might be right.
This is a dangerous road to tread—perhaps it is inevitable.
Re: taboos in EA, I think it would be good if somebody who downvoted this comment said why.
I didn’t downvote this just because I disagree with it (that’s not how I downvote), but if I could hazard a guess at why people might downvote, it’d be that some might think it’s a ‘thermonuclear idea’.
I have thought this exact thing for something like 2 years. I do think there are some potential backfire risks which make this an uncertain strategy.
slowing AI progress in the US and Europe, which is where OpenAI, DeepMind, Meta, etc. are located, does not neccesarily mean that AI progress in China slows as well. To the degree that you think US AI research will be more aligned than China’s, this differential slowing of AI progress could be net negative.
slowing AI progress in legible companies and labs may push AI research into more illegible places, which may be less careful on net.
Raising the salience of AI, which this campaign would require could accelerate AI progress more by making more governments pay attention to it and fund it more, out of paranoia.
All that said, I think far too little serious thinking/planning has gone into this kind of idea, and it should be more seriously considered. I would be happy to discuss this further if you’d like to chat. Send me a DM when you get a chance.
Iran is an agent, with a constrained amount of critical resources like nuclear engineers, centrifuges, etc.
AI development is a robust, agent-agnostic process that has an unlimited number of researchers working in adjacent areas who could easily cross-train to fill a deficit, an unlimited number of labs which would hire researchers from DeepMind and OpenAI if they closed, and an unlimited amount of GPUs to apply to the problem.
Probably efforts at getting the second-tier AI labs to take safety more seriously, in order to give the top tier more slack, will move back AI timelines a little? But most of the activities that my brain labels “nonviolent resistance” are the type that will be counterproductive unless there’s already a large social movement behind them.
Iran is a country, not an agent. Important distinction and I’m not being pedantic here. Iran’s aggressive military stance towards Israel is not quite the result of a robust, agent-agnostic process but it’s not the result of a single person optimizing for some goal either.
It’s important to recognize there are not actually an unlimited amount of GPUs, labs, dollars, or talented enough people available to allocate toward AGI development. People in particular is the sticking point for me here; there’s literally a highly constrained amount with the intelligence necessary to do and publish high level productive work in this kind of arena. It’s this reason that the second-tier labs are already an order of magnitude less effective than DeepMind. Maybe Oppenheimer or John von Neumann were unimportant to atomic bomb development timelines because they would just be replaced by someone else in the event that they found out it was going to cause the atmosphere to light up, but I doubt it.
This model treats the “product” DeepMind, Meta, and OpenAI generate in publishing research as having some static underlying driving propulsion like “market demand”. I don’t think there is actual market demand, in the monetary sense. From an shareholder’s point of view spending on arxiv research is mostly wasted money. There might be “demand” for AGI research in the wider sense that there are “charitable” people who like to fund long term scientific research, but that kind of demand is mostly driven by the starry eyed and often waning passions of higher ups at megacorps. A large number of people defecting or leaving for ethical reasons would affect that demand and start to encourage them to see their funding differently.
The point of this strategy is to convince people not to work in AIG-adjacent research labs at all, or retrofit existing organizations into doing something else, not to get them to quit some specific organization.
Google, Microsoft, and Facebook are all developing AI. So are Tencent, Sberbank, and the NSA. So are thousands of startups.
The most important thing is to solve the real problem—to design AI which, even when autonomous and smarter than human, has “friendly” values. (Just to be concrete, 22:30 here gives an example of what such a design could look like, albeit while still at a very theoretical and schematic stage.)
So you can try to speed up AI alignment research, and you can also try to slow down unaligned AI research. But there are thousands of independent research centers doing unaligned AI. OK, in the present, maybe there’s just a handful which are the ones at immediate risk of creating unaligned, autonomous, smarter-than-human AI. They’re mostly in huge powerful organizations, and there would be secret projects at the same level that you don’t know about—but OK, maybe you can make an impact somehow, by sabotaging TPU production, or by getting the UNSC to work with NeurIPS on a global regulatory regime, or…
You know, it’s going to be hard to stop these organizations from reaching for ultimate power. Putin already said a few years back, the nation that controls AI, controls the world. But also, the level of computing power and software knowledge required to become an immediate risk, gets lower over time. Maybe you can achieve a temporary tactical success through these preventive interventions, but for a primarily preventive strategy to work, you would have to change the entire direction of technological culture in this multipolar world. Also, the work that we already know about, from the big organizations we know about, is already alarmingly close to the point of no return. Proliferation will keep happening, but maybe we already know the name of the organization that will be the architect of our fate, we just don’t know which it shall be.
On the other hand: if you can solve the problem of friendliness, the problem of alignment, in practice and not just in theory (meaning you have enough computational resources to actually run your actually aligned AI program), then that’s it. Thus the focus on solving alignment rather than preventing nonalignment.
If we’re in a situation where it’s an open secret that a certain specific research area leads to general artificial intelligence, we’re doomed. If we get into a position where compute is the only limiting factor, we’re doomed. There’s no arguing there. The goal is to prevent us from getting into that situation.
As it stands now, certainly lots of companies are practicing machine learning. I have secondhand descriptions of a lot of the “really advanced” NSA programs and they fit that bill. Not a lot of organizations I know of however are actually pushing the clock hand forward on AGI and meta. Even fewer are doing that consistently, or getting over the massive intellectual hurdles that require a team like DeepMind’s. Completing AGI will probably be the result of a marriage of increased computing power, which we can’t really control, and insights pioneered and published by top labs, which I legitimately think we could to some degree by modifying their goals and talking to their members. OpenAI is a nonprofit. At the absolute bare minimum, none of these companies’ publish their meta research for money. The worst things they seem to do at this stage aren’t achieved when they’re reaching for power so much as playing an intellectual & status game amongst themselves, and fulfilling their science fiction protagonist syndromes.
I don’t doubt that it would be better for us to have AI alignment solved than to rely on these speculations about how AI will be engineered, but I do not see any good argument as to why it’s a bad strategy.
If we were “doomed” in this way, would you agree that the thing to do—for those who could do it—is to keep trying to solve the problem of alignment? i.e. trying to identify an AI design that could be autonomous, and smarter than human, and yet still safe?
Let me articulate my intuitions in a little bit more of a refined way: “If we ever get to a point where there are few secrets left, or that it’s common knowledge one can solve AGI with ~1000-10,000 million dollars, then delaying tactics probably wouldn’t work, because there’s nothing left for DeepMind to publish that speeds up the timeline.”
Inside those bounds, yes. I still think that people should keep working on alignment today, I just think other dumber people like me should try the delaying tactics I articulated in addition to funding alignment research.
I think the framing of “convince leading AI researchers to willingly work more closely with AI alignment researchers, and think about the problem more themselves” is the better goal. I don’t think hampering them generally is particularly useful/effective, and I don’t think convincing them entirely to “AGI is very scary” is likely either.
This is such a weird sentiment to me. I hear it a lot, and predicated on similar beliefs about AGI I feel like it’s missing a mood. If someone were missassembling a bomb inside a children’s hospital, would you still say “hampering them generally isn’t particularly useful/effective”? Would your objective be to get them to “think more about the problem themselves”? There’s a lack of urgency in these suggestions that seems unnatural. The overarching instrumental goal is to get them to stop.
I just expect much better outcomes from the path of close collaboration. To me it seems like I am a “nuclear power plant safety engineer”, and they are the “nuclear power plant core criticality engineers”. Preventing the power plant from getting built would mean it wasn’t built unsafely, but since I believe that would leave a void in which someone else would be strongly inclined to build their version… I just see the best likelihood of good outcome in the paths near ‘we work together to build the safest version we can’.
There’s a trap here where the more you think about how to prevent bad outcomes from AGI, the more you realize you need to understand current AI capabilities and limitations, and to do that there is no substitute for developing and trying to improve current AI!
A secondary trap is that preventing unaligned AGI probably will require lots of limited aligned helper AIs which you have to figure out how to build, again pushing you in the direction of improving current AI.
The strategy of “getting top AGI researchers to stop” is a tragedy of the commons: They can be replaced by other researchers with fewer scruples. In principle TotC can be solved, but it’s hard. Assuming that effort succeeds, how feasible would it be to set up a monitoring regime to prevent covert AGI development?
Top researchers are not easy to replace. Without the 0.1st percentile of researchers, progress would be slowed much more than 0.1%
This does make me wonder if activism from scientists has ever worked significantly. https://www.bismarckanalysis.com/Nuclear_Weapons_Development_Case_Study.pdf documents the Manhattan Project, https://www.palladiummag.com/2021/03/16/leo-szilards-failed-quest-to-build-a-ruling-class/ argues that there was partial success.
Thanks for the post! I think asking AI Capabilities researchers to stop is pretty reasonable, but I think we should be especially careful not to alienate the people closest to our side. E.g. consider how the Protestants and Catholics fought even though they agree on so much.
I like focusing on our common ground and using that to win people over.
Overall, I like the post’s emphasis on taking personal action, conditional on technical alignment being unlikely at the current rate of general-purpose AI development or impossible for fundamental reasons.
Two thoughts that I am happy to elaborate on:
Israel and Iran clearly are in an adversarial relationship. We are not in one yet for the most part with AGI researchers and scaleable ML developers. Let’s start by seeking to understand their perspectives, have one-on-one conversations, and find ways to communicate concerns and questions they will need to answer to (under legitimate public pressure).
This seems to suggest ‘try any strategy that might stick for containing AGI development’. I would disagree with this framing. A lot of ways to go about this will be indistinguishable from our community lashing out at another community for random/less noble reasons. You might not have meant it that way though – I admire the way you went about holding conversations in the EleutherAI #ai-alignment Discord Slack.
I wonder how many of us don’t want to see AI progress slow down because AI progress keeps proving us right.
After spending at least hundreds of hours reading lesswrong et al. and not being able to alter our path towards AI, I want the satisfaction of telling people “See? Told you so!”
It’s a natural inclination. Unfortunately I don’t think we’re likely to get the chance if things go really badly, and I don’t want to die.
For anyone interested in working on this, you should add yourself on this spreadsheet. https://docs.google.com/spreadsheets/d/1WEsiHjTub9y28DLtGVeWNUyPO6tIm_75bMF1oeqpJpA/edit?usp=sharing
It’s very useful for people building such an organisation to know of interested people, and vice versa.
If you don’t want to use the spreadsheet, you can also DM me and I’ll keep you in the loop privately.
If you’re making such an organisation, please contact me. I’d like to work with you.
What is this supposed to look like if researchers are actually convinced? The whole raison d’etre for Deepmind is to create AGI. And while AGI may be a world-ending invention, many of the creations along the way are likely to be very economically valuable to Deepmind’s parent company, Google.
Take for example this blog where Deepmind describes an application of AI to Google’s datacenters that was able to reduce the cooling bill by 40%. What is the economic justification for supporting Deepmind researchers with a minimum salary of $400k if they aren’t producing the kind of products that can do this? And can they produce products like this without working on capabilities?
I think it’s going to be significantly harder to convince these engineers to stop working on AGI if it means taking a 75% paycut. What’s the alternative career path for them?
Strongly agree with this. In my opinion, a more effective intervention in this line would be to side with those who want to curtail the power of tech companies in the USA. For example, breaking up Meta and Google would greatly reduce their ability to fund AI development, and they’re currently responsible for most of the boundary-pushing on AGI.
I think direct outreach to the AI labs is a great idea. (If coordinated and well thought out.) Trying to get them to stop building AGI seems unlikely to help IMO, though I’m not totally against it.
I’d be more interested to see targeted efforts aimed at improving safety outcomes from the major AI labs. Things like:
Getting more value-aligned people in the AIS community onto the safety teams of DeepMind and OpenAI
EA funders offering those labs money to increase headcount on their safety teams
Other efforts to try and help tilt the culture of those labs more toward safety or convince their leadership to prioritize safety more highly
(My background assumptions here are short-ish timelines for AGI/TAI and that with high confidence it will be originating from one of a handful of these AI labs.)
This is something I’ve been thinking about a lot very recently. But as another commenter said, it’s probably better to see what the AI governance folks are up to, since this is essentially what they do.
(I learned today that “AI governance” isn’t just about what governments should do but also strategy around AI labs, etc.)
Why is this important? As far as I can tell, the safety teams of these two organisations are already almost entirely “value-aligned people in the AIS community”. They need more influence within the organisation, sure, but that’s not going to be solved by altering team composition.
rachelAF mentioned that she had the impression their safety teams were more talent-constrained than funding-constrained. So I inferred that getting more value-aligned people onto those teams wouldn’t just alter the team composition, but increase the size of their safety teams.
We probably need more evidence that those teams do still have open headcount though. I know DeepMind’s does right now, but I’m not sure whether that’s just a temporary opening.
You make a good point though. If the safety teams have little influence within those orgs, then it #3 may be a lot more impactful than #1.
Interesting, how do you know this? Is there information about these teams available somewhere?
Agreeing with your post, I think it might be important to offer the people you want to reach out to a specific alternative what they should work on instead (because otherwise we are basically just telling them to quit their job, which nobody likes to hear). One such alternative would be AI alignment, but maybe that is not optimal for impatient people. I assume that researchers at OpenAI and DeepMind are in it because of the possibilities of advanced AI and that most of them are rather impatient to see them realized. Do you think it would be a good idea to advocate that those to don’t want to work on alignment work on shallow AI instead?
I am also thinking of this blog post, arguing that “It’s Our Moral Obligation to Make Data More Accessible” because there is a lot proprietary data out there, which only one company/institution has access to and that stifles innovation (and it’s possible to do so while respecting privacy). This also means that there is potentially a lot of data no (or few) shallow, safe ML algorithms have been tried on and that we might be able to get a substantial fraction of the benefits of AGI by just doing more with that data.
There are of course downsides to this. Making data more accessible increases the number of applications of AI and could thus lead to increased funding for AGI development.
EDIT: Just realized that this is basically the same as number 4 in The case for Doing Something Else:
Taking for granted that AGI will kill everybody, and taking for granted that this is bad, it’s confusing why we would want to mount costly, yet quite weak, and (poorly) symbolic measures to merely (possibly) slow down research.
Israel’s efforts against Iran are a state effort and are not accountable to the law. What is proposed is a ragtag amateur effort against a state orders of magnitude more powerful than Iran. And make no mistake, AGI research is a national interest. It’s hard to overstate the width of the chasm.
Even gaining a few hours is pretty questionable, and a few hours for a billion people may be a big win or it might not. Is a few seconds for a quadrillion people a big win? What happens during that time and after? It’s not clear that extending the existence of the human race by what is mostly a trivial amount of time even in the scope of a single life is a big deal even if it’s guaranteed.
There is also a pretty good chance that efforts along the lines described may backfire, and spur a doubling-down on AGI research.
Overall this smells like a Pascal’s scam. There is a very, very low chance of success against a +EV of debatable size.
I think you’re overstating the width of the chasm. Where are you getting the impression that congress or the state department is so committed to AGI in a non-symbolic way? Most of the research towards AGI capabilities is done at the moment by private actors, staffed almost entirely by the kind of systematizing nerd most often concerned by existential risk. What exactly is the grand difficulty you expect scaling up the kind of outreach I outlined in the post to a slightly more hostile environment?
I feel like OP has not read the Unabomber’s manifesto, but has reached some of its’ conclusions independently.
Please don’t try to physically harm AI researchers like the Israelis are alleged to have done to Iranian nuclear / Egyptian rocket scientists. That would spread a lot of misery, and probably not achieve anything you think is good.
I was unaware that the rationalist/less wrong position is one of ‘AI Luddites’, but I guess it makes sense.
Tangential: I am not convinced that the current Israeli approach of “buying time” until a very likely eventual nuclear confrontation is a good approach. If the current leadership was serious about the state’s survival, they would focus on “Iran alignment”, not on Iran containment (similar to AI boxing). The difference is that the Iran alignment problem is actually solvable (e.g. by a voluntary withdrawal from West Bank, working on figuring out a two-state solution...).
AI alignment is much harder than “I alignment”, and AI boxing is much harder still, so maybe “Dying with Dignity” in this case would mean delaying the inevitable and hoping for a miracle.
Concerning the first paragraph, I, the government, and most other Israelis disagree with that assessment. Iran has never given any indication that working on a two-state solution would appease them. As mentioned in the OP, Iran’s projected goal is usually the complete removal of Israel, not the creation of a Palestinian state. On the other hand, containment of their nuclear capabilities is definitely possible, as repeated bombings of nuclear facilities can continue on forever, which has been our successful policy since the 1980s (before Iran, there was Iraq, which had its own nuclear program).
Iran is powerful, but Israel is tiny country that can be easily defeated in other ways.
Nukes are a flex. They cannot be used. If they do use them they might kill more people than just the ones they don’t like. Biological weapons are taboo also, but are lot easier to use and can be used to the same effect without much trouble.
So you might ask if Iran is really “saying destroy Israel” do they mean” it “and acting on it or are they just working on having more power?
I don’t think Iran can use nukes. Nor will they, nor will they have the ability to deploy them effectively.
I think Nukes are only political weapon. Minus Japan they have always been used as political weapon primarily.
And as long as Nukes work as political weapon they have greater leverage value for leaders than any other weapon.
The narratives about Iran and Israel are of use too. They too are political weapon.
Thus Israel talking about nuclear weapons it self is a form of weapon.
Its not clear whether Iran ever develops or even aims to develop nukes. Its merely “accepted as true” regardless of what we know, and we know very little.
Even in case of Japan one could argue the demonstration of power here was more political weapon than military victory. (But that is highly controversial), what is never controversial is that use of nuclear bombs had more than military implications. Which obviously resulted in arms race and cold war. Whether that was good or even intent I cannot say.
AI is not Iran though. Its not human, and while I assume its aimed to be human, its real mission is to combine both human and inhuman qualities.
Therefore we might find that whatever happens in future is going to be “unknown”.
And also the old famous “unknown, unknown”. Merely meaning that for example AI might never need to use strategies like game theory at all.
So the bottom line is its misguided to assume qualities about AI or its research, but being fearful is not wrong.
Its just to fear this is pointless.
Iran would also like to not get nuked by the United States. Why isn’t Iran sabotaging U.S. weapon systems, getting U.S. nuclear engineers to defect, and getting its allies to put sanctions on the U.S. conditional on disarmament?
Sometimes hard power just isn’t the right solution, especially when you are a relatively small fish whose main chance at victory comes from a mutualistic relationship, not an antagonistic one.
The reason they don’t do it is that Iran doesn’t have the state capacity for these actions, nor would Iran be willing to risk an armed conflict with the world’s most powerful military. Iran also doesn’t believe the United States is ever going to nuke it, so it has little motivation to do this in the first place. None of this is analogous from Israel’s perspective, so their actions make strategic sense.
Well, hard power is all they have. Solving the alignment problem would probably be an order of magnitude easier than developing a coherent plan to restore friendly diplomatic relations between Israel and Iran.