Formerly alignment and governance researcher at DeepMind and OpenAI. Now independent.
Richard_Ngo
Yes, I disagree voted; if I were to verbalize the disagreement, it would be something like “I disagree that you understand the context / goals of the conference enough to make reasonable suggestions about it like this, and this particular suggestion is probably quantitatively pretty incorrect (though as I noted, in a sense slightly directionally correct)”.
Got it. On a meta level I don’t think you’re using the disagree vote correctly. The point of disagree votes IMO is to disentangle “am I glad this comment was made” from “is this comment factually correct”. In this case it seems like you thought my comment was at least somewhat correct, but that you thought I didn’t have the standing required to make this kind of comment.
I would suggest that downvoting the comment is a better way to convey your position. Though I also think that it’s one of LW’s most valuable norms that it doesn’t gatekeep very much who has social permission to make comments or criticisms, so in your position I wouldn’t downvote.
My main takeaway here is that I should probably write about updates that I made based on running the Alignment Workshop (which, despite its name, was closer in function to what you’re calling a “conference”), so I have a canonical thing to point people to.
I would suggest the possibility of attending and then simply not attending talks. I believe that for any given talk, there will be many people not attending that talk
My experience of workshops like this one makes me expect that ~90% of people will attend a talk during most of the talk slots, but I’m open to being wrong.
On the general topic of conference design, I’m not sure how strongly I believe the recommendations; my guess is there are many desiderata you’re simply not modeling.
FWIW my advice here is partly based on my experience running the original Alignment Workshop, which had many of the same desiderata as your conference (though of course there are many important differences).
Also, out of curiosity, was it you who −9 disagree-voted my comment?
I considered attending after seeing this but was put off by how full of talks the schedule was.
Perhaps too late by now, but I’d strongly suggest reading this post and implementing some of the recommendations.
When people try to be more ambitious it often makes their research worse, because it closes them off to interesting research directions that they don’t yet understand how to scale.
I’m more excited about encouraging people to do creative and beautiful research. Here’s an example of creative research, and here’s an example of beautiful research, both from Jascha Sohl-Dickstein. In the long term I expect creative and beautiful research to uncover much more interesting and important phenomena.
In general I find that I can trace losses in Go games to moments when I acted unvirtuously (e.g. greedily, impatiently, fearfully, arrogantly, etc).
Go is also a long enough game that one mistake seldom sinks you (as long as you’re willing to give up the sunk cost).
My argument didn’t rely on the idea that it’s unrealistic, but rather the jump from “this is realistic” to “this should be my focus”.
Like I said, I think you’re making two mistakes that cancel out, so I don’t want to try to argue you out of the second mistake. I think the things you’re focusing on are important questions, which I’m also working on myself. I will have some posts coming out explaining my perspective on them soon; in the meantime, the best summary I have is this post.
The main thing I want to point at is that “suppose this is the final year before humanity loses control to AI. What should I do, where should I focus?” is just a bizarre starting point. I expect that if you carefully scrutinize the reasons why you are making your research plan contingent on that supposition, you will find that they are significantly confused.
For example, some people (especially EAs) implicitly reason “there’s a 10% chance of AGI takeover by year X. But a 10% x-risk is really bad! Therefore I should focus my efforts on preventing AGI takeover by year X.” This logic clearly doesn’t stand up even on its own terms. I don’t think you’re making quite that mistake but probably something in the same broad family.
(Probably won’t reply further, since I’m working on some posts that analyze these kinds of mistakes more generally, which seems more productive.)
Politics has worked reasonably well for limiting atomic weapons
Politics also worked very well for creating atomic weapons.
“Worth a shot” is the type of conclusion that is best applied to things that have positive-skewed outcomes, but seems to be missing a mood when applied to things that could cause big positive or negative effects.
On the whole, I felt there was more sanity than I expected from politicians.
Conditional on observing that the system as a whole operates at a given level of insanity, if there’s more sanity than you expected in conversations with individual politicians then there’s likely less sanity than you expected in the process by which conversations with individual politicians end up as policy outcomes.
For example, politicians might be better than you expect at saying reassuring things while totally compartmentalizing those statements from their actions (or, indeed, then saying just-as-reassuring things to other people whose beliefs are the polar opposite of yours).
I could debate these details but honestly if we rank all books by “to what extent does the protagonist end up with total power over the world at the end” these seem like they shift Unsong from the top 0.01% to the top 0.02%, or something like that?
For example, yes they’re carrying forward God’s perfect plan because everything in Unsong is. But that plan still involves them conceptualizing the kind of world they want and wielding God’s name to make it happen.
I do take your point, but factory farms are way more temporary than hells, which seems more relevant than the absolute level of suffering in them re whether to take over the world. (They’re also far less bad than any hell portrayed by a creative rationalist author.)
I… am not sure how much gentler the author could have made this, to be honest. A singleton forms and allows diverse values— basically ideal.
This is precisely the attitude I am critiquing, and therefore I don’t find your comment very persuasive.
On the meta level: why are there this many net upvotes and agreement votes for planning horizons of “at most 5 years”? This updates me towards thinking that some aspect of collective epistemics is notably worse than I had been tracking.
Conditional on being around to look back, it seems pretty plausible to me that lack of trust and competence within major powers will have made the outcome of AGI significantly worse than it could have been.
A (partial, not very good) analogy is that, at this point, the developed world is pretty altruistic towards the developing world (e.g. to the tune of many billions of dollars of aid per year). But the developing world might still really wish it’d had fewer internal ethno-religious fractures during the Industrial Revolution (or indeed at at any time since then).
Copying over my response to Scott from Twitter (with a few additions in square brackets):
I think my biggest disagreement here is about the concept of strategic communications.
In particular, you claim that MIRI should have been more PR-strategic to avoid hyping AI enough that DeepMind and OpenAI were founded.
Firstly, a lot of this was not-very-MIRI. E.g. contrast Bostrom’s NYT bestseller with Eliezer popularizing AI risk via fanfiction, which is certainly aimed much more at sincere nerds. And I don’t think MIRI planned (or maybe even endorsed?) the Puerto Rico conference.
But secondly, even insofar as MIRI was doing that, creating a lot of hype about AI is also what a bunch of the allegedly PR-strategic people are doing right now! Including stuff like Situational Awareness and AI 2027, as well as Anthropic. [So it’s very odd to explain previous hype as a result of not being strategic enough.]
You could claim that the situation is so different that the optimal strategy has flipped. That’s possible, although I think the current round of hype plausibly exacerbates a US-China race in the same way that the last round exacerbated the within-US race, which would be really bad.
But more plausible to me is the idea that being loud and hype-y is often a kind of self-interested PR strategy which gets you attention and proximity to power without actually making the situation much better, because power is typically going to do extremely dumb stuff in response. And so to me a much better distinction is something like “PR strategies driven by social cognition” (which includes both hyping stuff and also playing clever games about how you think people will interpret you) vs “honest discourse”.
To be clear I don’t have a strong opinion about how much IABIED fits into one category vs the other, seems like a mix. A more central example of the former is Situational Awareness. A more central example of the latter is the Racing to the Precipice paper, which lays out many of the same ideas without the social cognition.
My other big disagreement is about which alignment work will help, and how. Here I have a somewhat odd position of both being relatively optimistic about alignment in general, and also thinking that almost all work in the field is bad. This seems like too big a thing to debate here but maybe the core claim is that there’s some systematic bias which ends up with “alignment researchers” doing stuff that in hindsight was pretty clearly mainly pushing capabilities.
Probably the clearest example is how many alignment researchers worked on WebGPT, the precursor to ChatGPT. If your “alignment research” directly leads to the biggest boost for the AI field maybe ever, you should get suspicious! I have more detailed modes of this which I’ll write up later but suffice to say that we should strongly expect Ilya to fall into similar traps (especially given the form factor of SSI) and probably Jan too. So without defusing this dynamic, a lot of your claimed wins don’t stand up.
have we, as the AI Safety community, already lost? That is, have we passed the point of no return, after which becomes both likely and effectively outside of our control?
I think you’re missing a word after “which”. But also, the “outside of our control” part seems like a bad definition of losing, insofar as there are other actors who might be able to steer things instead.
Glad to see these kinds of reflections in general, though.
“Opportunity cost” is another slippery concept that in the economic framework seems similar to other costs, but in a sociopolitical framework seems extremely different.
Suppose I steal $1000 of your stuff. You can describe this as me imposing a $1000 cost on you.
But suppose instead that I offer you $1000 if you quit your job. Assuming you’re happy enough with your job that this doesn’t move you to act, then what I’ve done is just to “impose” a $1000 opportunity cost on you. But of course this does no harm to you.
And so the phrase “opportunity cost” is inherently a misleading one, especially when used as you do above (i.e. talking about “paying” an opportunity cost in the same way that you pay taxes). You have elided the distinction between me freely choosing to optimize for other things than financial returns, versus me having my money taken away from me using the threat of force.
Related: the faction most worried about building superintelligent AI evolve from the faction most worried about not building superintelligent AI.
I broadly agree with this comment too, though not as much as I agree with the other one.
Power felt can also be a kind of honesty—e.g. if a law is backed by force, then it’s often better for this to be unambiguous, so that people can track the actual landscape of power.
(Of course, being unambiguous about how much force backs up your laws can also be a kind of power move. I expect that there are ways to get the benefits of honesty without making it a power move, but I don’t have enough experience with this to be confident.)
In other words, I expect that the kind of inefficiency Val is talking about here is actually sometimes load-bearing for accountability.
Yes, great summary, I fully endorse it.
I broadly agree, and it’s useful to have such a fleshed-out list.
Though note that once we’re talking about highly non-fungible/non-commensurable effects, the term “costs” might be misleading. As a toy example, you might intuitively assume that needing someone else to put effort into safety costs “political capital”. But suppose Franklin was right when he wrote “He that has once done you a kindness will be more ready to do you another, than he whom you yourself have obliged.” Then the implicit assumptions behind talking in terms of political costs and capital break down.