Formerly alignment and governance researcher at DeepMind and OpenAI. Now independent.
Richard_Ngo
Politics has worked reasonably well for limiting atomic weapons
Politics also worked very well for creating atomic weapons.
“Worth a shot” is the type of conclusion that is best applied to things that have positive-skewed outcomes, but seems to be missing a mood when applied to things that could cause big positive or negative effects.
On the whole, I felt there was more sanity than I expected from politicians.
Conditional on observing that the system as a whole operates at a given level of insanity, if there’s more sanity than you expected in conversations with individual politicians then there’s likely less sanity than you expected in the process by which conversations with individual politicians end up as policy outcomes.
For example, politicians might be better than you expect at saying reassuring things while totally compartmentalizing those statements from their actions (or, indeed, then saying just-as-reassuring things to other people whose beliefs are the polar opposite of yours).
I could debate these details but honestly if we rank all books by “to what extent does the protagonist end up with total power over the world at the end” these seem like they shift Unsong from the top 0.01% to the top 0.02%, or something like that?
For example, yes they’re carrying forward God’s perfect plan because everything in Unsong is. But that plan still involves them conceptualizing the kind of world they want and wielding God’s name to make it happen.
I do take your point, but factory farms are way more temporary than hells, which seems more relevant than the absolute level of suffering in them re whether to take over the world. (They’re also far less bad than any hell portrayed by a creative rationalist author.)
I… am not sure how much gentler the author could have made this, to be honest. A singleton forms and allows diverse values— basically ideal.
This is precisely the attitude I am critiquing, and therefore I don’t find your comment very persuasive.
On the meta level: why are there this many net upvotes and agreement votes for planning horizons of “at most 5 years”? This updates me towards thinking that some aspect of collective epistemics is notably worse than I had been tracking.
Conditional on being around to look back, it seems pretty plausible to me that lack of trust and competence within major powers will have made the outcome of AGI significantly worse than it could have been.
A (partial, not very good) analogy is that, at this point, the developed world is pretty altruistic towards the developing world (e.g. to the tune of many billions of dollars of aid per year). But the developing world might still really wish it’d had fewer internal ethno-religious fractures during the Industrial Revolution (or indeed at at any time since then).
Copying over my response to Scott from Twitter (with a few additions in square brackets):
I think my biggest disagreement here is about the concept of strategic communications.
In particular, you claim that MIRI should have been more PR-strategic to avoid hyping AI enough that DeepMind and OpenAI were founded.
Firstly, a lot of this was not-very-MIRI. E.g. contrast Bostrom’s NYT bestseller with Eliezer popularizing AI risk via fanfiction, which is certainly aimed much more at sincere nerds. And I don’t think MIRI planned (or maybe even endorsed?) the Puerto Rico conference.
But secondly, even insofar as MIRI was doing that, creating a lot of hype about AI is also what a bunch of the allegedly PR-strategic people are doing right now! Including stuff like Situational Awareness and AI 2027, as well as Anthropic. [So it’s very odd to explain previous hype as a result of not being strategic enough.]
You could claim that the situation is so different that the optimal strategy has flipped. That’s possible, although I think the current round of hype plausibly exacerbates a US-China race in the same way that the last round exacerbated the within-US race, which would be really bad.
But more plausible to me is the idea that being loud and hype-y is often a kind of self-interested PR strategy which gets you attention and proximity to power without actually making the situation much better, because power is typically going to do extremely dumb stuff in response. And so to me a much better distinction is something like “PR strategies driven by social cognition” (which includes both hyping stuff and also playing clever games about how you think people will interpret you) vs “honest discourse”.
To be clear I don’t have a strong opinion about how much IABIED fits into one category vs the other, seems like a mix. A more central example of the former is Situational Awareness. A more central example of the latter is the Racing to the Precipice paper, which lays out many of the same ideas without the social cognition.
My other big disagreement is about which alignment work will help, and how. Here I have a somewhat odd position of both being relatively optimistic about alignment in general, and also thinking that almost all work in the field is bad. This seems like too big a thing to debate here but maybe the core claim is that there’s some systematic bias which ends up with “alignment researchers” doing stuff that in hindsight was pretty clearly mainly pushing capabilities.
Probably the clearest example is how many alignment researchers worked on WebGPT, the precursor to ChatGPT. If your “alignment research” directly leads to the biggest boost for the AI field maybe ever, you should get suspicious! I have more detailed modes of this which I’ll write up later but suffice to say that we should strongly expect Ilya to fall into similar traps (especially given the form factor of SSI) and probably Jan too. So without defusing this dynamic, a lot of your claimed wins don’t stand up.
have we, as the AI Safety community, already lost? That is, have we passed the point of no return, after which becomes both likely and effectively outside of our control?
I think you’re missing a word after “which”. But also, the “outside of our control” part seems like a bad definition of losing, insofar as there are other actors who might be able to steer things instead.
Glad to see these kinds of reflections in general, though.
“Opportunity cost” is another slippery concept that in the economic framework seems similar to other costs, but in a sociopolitical framework seems extremely different.
Suppose I steal $1000 of your stuff. You can describe this as me imposing a $1000 cost on you.
But suppose instead that I offer you $1000 if you quit your job. Assuming you’re happy enough with your job that this doesn’t move you to act, then what I’ve done is just to “impose” a $1000 opportunity cost on you. But of course this does no harm to you.
And so the phrase “opportunity cost” is inherently a misleading one, especially when used as you do above (i.e. talking about “paying” an opportunity cost in the same way that you pay taxes). You have elided the distinction between me freely choosing to optimize for other things than financial returns, versus me having my money taken away from me using the threat of force.
Related: the faction most worried about building superintelligent AI evolve from the faction most worried about not building superintelligent AI.
I broadly agree with this comment too, though not as much as I agree with the other one.
Power felt can also be a kind of honesty—e.g. if a law is backed by force, then it’s often better for this to be unambiguous, so that people can track the actual landscape of power.
(Of course, being unambiguous about how much force backs up your laws can also be a kind of power move. I expect that there are ways to get the benefits of honesty without making it a power move, but I don’t have enough experience with this to be confident.)
In other words, I expect that the kind of inefficiency Val is talking about here is actually sometimes load-bearing for accountability.
Yes, great summary, I fully endorse it.
I claim that there are some people such that, if they were dictators of China, that would be much worse than the current situation. And there are some people such that, if they were dictators of China, that would be much better than the current situation. Which category a given person falls into depends a lot on their honesty, integrity, wisdom, ability to understand political dynamics, ability to resist manipulation, etc.
There are no particular limits I’d want to place on a sufficiently virtuous Superman. E.g. I want Superman to follow a policy that leads him to overthrow the government of China iff he is in the latter category. The big question is how Superman can gain justified confidence that he’s in the latter category, given that unvirtuous people are prone to a lot of self-deception. One way he can do it is by setting limits on his own behavior so that he can gain more evidence about what kind of person he is. E.g. maybe he thinks he’s really wise about politics—wise enough that him having control over US electoral policy is a good idea. If so, he should try to test that wisdom by implementing political change without using violence. If he starts telling you that he doesn’t need to pass such tests, because he’s already so confident that his plan is a good idea, then you should start getting worried.
In other words, when I think about a question like “should Superman forcibly institutes electoral reforms to make the US government more functional”, I expect that there are some ways to do this that are really good, and some ways to do this that are really bad. And the kinds of people who are capable of doing it in a really good way (given that they’re Superman) are also generally the kinds of people who wouldn’t need to use much force to make it happen (given that they’re Superman).
Are you saying that if the diplomatic negotiations deteriorate to the point of military action, that means that our hypothetical superman has failed, and he would be better off retiring? Don’t existing legitimate countries go to war for far less noble reasons all the time?
I indended this to refer to scenarios where the US itself (or other leading western powers) were taking military action against Superman. I care much less about whether he destabilizes North Korea or Eritrea or even countries similar to those but better-governed. But I care a lot about whether he destabilizes the countries I consider the best and most important ones.
Many nations would consider Superman’s property damage to their factories to be an act of war by a foreign power
Maybe. Or maybe they really wouldn’t want to pick a fight with Superman. Or maybe they would issue an angry press release then not do anything. In a setting where Superman holds basically all the cards in terms of physical force, most nations would try quite hard to defuse tensions with him (unless, as I discussed, he’s very unskilled).
I think Alexander Hamilton was the beginning, but this seems like a big step. Vassar talks about how, from the civil war onwards, the American legal system needed to be optimized to rule a vassal state while also pretending that they weren’t ruling a vassal state. Can’t remember the specific examples he cited to me but I found it fairly compelling.
Oh, I think of “ending factory farming” as very far from “taking over the world”.
If Superman were a skilled political operator it could be as simple as arranging to take photoshoots with whichever politicians legislated the end of factory farms.
Or if he were less skilled it could involve doing various kinds of property damage to factory farms (potentially even things which there aren’t laws against, like flying around them in a way which blows the buildings over).
This might escalate to the government trying to arrest him, and outright conflict, but honestly if Superman isn’t skillful enough to defuse that kind of thing, given his influence, then he doesn’t have much business imposing political changes on the world anyway. A politically unskilled and/or unvirtuous Superman trying to end factory farming could quite easily destabilize society in a way that is far worse long-term than letting factory farming end on whatever the natural counterfactual timeline is (without AI, maybe 20 or 30 years?)
Relatedly I’m increasingly coming to believe that this reasoning applies to Lincoln, and that we’d be in a much better position if he’d let the Confederacy secede and then imposed strong economic and moral pressure on them to end slavery.
Eliezer has said that one of the reasons he writes fanfiction, is that he doesn’t have to invent the world. All of the horror and badness was already present in the source material.
A large majority of fantasy settings don’t have literal hells as a key component, so I think my point is still applicable to Project Lawful if you replace “design them that way” with the more general “select for that trait”.
I do agree that this is a good point with regards to HPMoR, which is one reason why I didn’t include HPMoR in my original list of examples.
This is pretty straightforwardly not true, there are plenty of academics (for example) who are as smart as rationalists but don’t do very broad instrumental reasoning.
There are also plenty of people who don’t fantasize about becoming all-powerful dictators.
I think that the hunger to become god is an unusually rationalist trait. Honestly it’s somewhat reminiscent of sociopathy, but fortunately few rationalists seem to be sociopaths. However, I do think a sufficient level of fear of death causes some overlapping traits, e.g. a mentality in which more power is crucial to solving problems. (This is not meant in a particularly blame-y way, I’m just as much an example of this as anyone else around here.)
Very interesting.
Relatedly, A Practical Guide to Evil is one of my favorite books/series, and grapples with the tension between trust and power very well. It’s one of the very few narratives I’ve seen written skillfully enough that the protagonist giving up power didn’t seem straightforwardly stupid to me (even when I was in a classic rationalist mindset).
Like I said, I think you’re making two mistakes that cancel out, so I don’t want to try to argue you out of the second mistake. I think the things you’re focusing on are important questions, which I’m also working on myself. I will have some posts coming out explaining my perspective on them soon; in the meantime, the best summary I have is this post.
The main thing I want to point at is that “suppose this is the final year before humanity loses control to AI. What should I do, where should I focus?” is just a bizarre starting point. I expect that if you carefully scrutinize the reasons why you are making your research plan contingent on that supposition, you will find that they are significantly confused.
For example, some people (especially EAs) implicitly reason “there’s a 10% chance of AGI takeover by year X. But a 10% x-risk is really bad! Therefore I should focus my efforts on preventing AGI takeover by year X.” This logic clearly doesn’t stand up even on its own terms. I don’t think you’re making quite that mistake but probably something in the same broad family.
(Probably won’t reply further, since I’m working on some posts that analyze these kinds of mistakes more generally, which seems more productive.)