This is a bit of a rant but I notice that I am confused.
Eliezer said in the original Sequences:
But it’s pretty obvious that LessWrong is not about winning (and Eliezer provides a more accurate definition of what he means by rationality here). As far as I can tell LW is mostly about cognitive biases and algorithms/epistemology (the topic of Eliezer’s sequences), self-help, and a lot of AI alignment.
But LW should be about winning! LW has the important goal of solving alignment, so it should care a lot about the most efficient way to go about it, in other words about how to win, right?
So what would it look like if LW had a winning attitude towards alignment?
Well, I think this is where the distinction between the two styles of rationality (cognitive algorithm development VS winning) matters a lot. If you want to solve alignment and want to be efficient about it, it seems obvious that there are better strategies than researching the problem yourself, like don’t spend 3+ years on a PhD (cognitive rationality) but instead get 10 other people to work on the issue (winning rationality). And that 10x s your efficiency already.
My point is that we should consider all strategies when solving a problem. Not only the ones that focus directly on the problem (cognitive rationality/researching alignment), but also the ones that involve acquiring a lot of resources and spending these to solve the problem (winning rationality/getting 10 other people to research alignment).
This is especially true when other strategies get you orders of magnitude more leverage on the problem. To pick an extreme example, who do you think has more capacity to solve alignment, Paul Christiano, or Elon Musk? (hint: Elon Musk can hire a lot of AI alignment researchers).
I am confused because LW teaches cognitive rationality so it should notice all that and recognize that epistemology and cognitive biases and a direct approach is not the most efficient way to go about alignment (or any ambitious goal), and start studying how people actually win in the real world.
But it’s not happening (well, not much at least).
As far as I can tell cognitive rationality helps but winning seems to be mostly about agency and power really. So maybe LW should talk more about these (and how to use them for good)?
a) what makes you think this isn’t already what’s happening? (I think it’s actually happened a lot)
b) I think we’ve historically actually overindexed on the kinds of things you talk about here, and much of it has turned out to be very bad for the world IMO, and the good parts of it are still much harder/complicated than you’re implying here.
(This comment ended up a little more aggro than I meant it to, I think it’s fairly reasonable to come in with the question you have here, but I do think the assumption here is fairly wrong on two levels)
There’s been a bunch of fieldbuilding work, starting with the MIRI (then Singularity Institute) Summer Fellows, in many ways the founding of CFAR, AIRCs, MATS, PIBBS. (CFAR both included a fairly major focus on “winning” and also was in significant part an effort to recruit people capable of working on the alignment problem).
In 2014, this included getting Elon Musk involved, which AFAICT contributed nontrivially to OpenAI getting created, which was IMO very bad for the world. Later, the person who seemed maybe on track to have a lot of real world power/winning was Sam Bankman-fried, who later turned out to destroy $8 billion and burn a lot of bridges and was hugely net negative.
It’s not enough to say “work on AI and alignment”, you need to successfully convey the subtleties of what that actually means. Today, there are programs that particularly scale the reasonably-scalable parts of the AI safety field, but those parts generally aren’t the most difficult and bottlenecky parts. And it’s still a fairly common outcome from people in those programs to end up joining frontier labs doing work that is IMO net negative.
The work that needs doing for alignment is just actually very hard, many people working on the harder parts have tried and failed to scale the work.
(Also, note, the whole reason I wrote Rationality is not (exactly) Winning is that this was a very common focus, that needed to be argued against. It turns out when you focus on winning, you get powerseeking and bad epistemics fairly shortly)
None of this is to say winning isn’t important or even in some sense the most important part of rationality, just that overly focusing on it has predictable problems.
See:
The Field of AI Alignment: A Postmortem, and What To Do About It
AI alignment researchers don’t (seem to) stack
What Money Cannot Buy
Note: I have not read the linked posts yet, will do that later
a) I don’t see many posts to the tune of “What do you think of [some strategy that doesn’t involve direct research on alignment]?” (maybe getting influence in your local town hall, or university etc), perhaps you can point out to such posts? In the same way I don’t see a lot of experience reports like “I paused alignment research and went off this other route instead, hoping for an efficiency multiplier. Here’s what worked and here is what didn’t”.
I am not saying these posts never happen, but given the possible leverage, I would expect to see more of them. I think it’s fair to say that there are a lot more posts about direct research than about other (leveraged) ways to approach the issue. For example here is my LW feed, there are 3.5 posts about alignment (highlighted), 3.5 about AI and none about other strategies (the post “Lessons from the Iraq War for AI policy” is still pretty far from that as it does not discuss something like a career path or actions that can be taken by an individual).
You say these have happened a lot, but I don’t see this discussed much on LW. LW itself can be characterized as Eliezer’s very successful leveraged strategy to bring more people into alignment research, so maybe the leveraged strategies end up discussed more outside LW? But in any case this at least shows that some leveraged strategies work, so maybe it’s worth discussing more.
b) I think this can be summarized as “we don’t know how to put more resources into alignment without this having (sometimes very) negative unintended outcomes”. Okay fair enough, but this seems like a huge issue and maybe there should be more posts about exploring and finding leveraged strategies that won’t backfire. Same for power seeking, there is a reason why power is an instrumental goal of ASI, it’s because it’s useful to accomplish any goal, so it’s important to figure out good ways to get and use power.
Now maybe your answer is something like “we tried, it didn’t work out that well so we re-prioritized accordingly”. But it’s not obvious to me that we shouldn’t try more and develop a better map of all the available options. Anyway, I will read up on what you linked, if you have more links that you think would clarify what was tried and what worked/didn’t work don’t hesitate to share.
I think these mostly don’t take the form of “posts” because it mostly involves actually going and forming organizations and coordinating and running stuff. (maybe see Dark Forest Theories—most of the discussion of it is happening in places you can’t see because it’s pretty high context and not that useful to have randos in it)
There was a lot more explicit discussion of this sort of thing 10 years ago during the early days of the EA movement, and right now I think it’s a combo of a) mostly those conversations turned into professional orgs doing stuff, and b) we’re also in a period where it’s more obvious that there were significant problems with this focus so there’s a bit of a reaction against it.
Also, note: if your plan to recruit more people is working, you should still expect to see mostly posts on the object level. Like, if you didn’t successfully get 10x or 100x the people working on the object level, that would indicate your plan to scale had failed.
My understanding is that Eliezer himself does not view it as hugely successful. MIRI thinks that ~nobody in LW-adjacent communities is doing useful alignment work, and my expectation is that Eliezer would agree with this post of John’s regarding the state of the field.
Simultaneously, the proliferation of the talk about the AI Alignment problem, which was ~necessary to kickstart the field, potentially dramatically decreased the time-to-omnicide. It attracted the attentions of various powerful people whose contributions were catastrophically anti-helpful, from those who were probably well-meaning but misunderstood the problem (Elon Musk) to straight-up power-hungry psychopaths (Sam Altman).
I overall agree that “getting dramatically more people to work on alignment” is a good initial idea. But it seems that what actually happens when you try to proliferate the talk about the problem, is that most people end up misunderstanding it and either working on wrong problems, or actively making things worse. This is of course fundamentally a skill issue on the part of the proliferators, but the level of skill where this doesn’t happen may be really high, and as you’re trying to get better at this, you’re leaving net-negative memetic infections in your wake. Plus, you may not actually get to iterate indefinitely on this: there are only so many Silicon Valleys and so many billionaires.
So the “recruit more people to work on the problem” strategy that would be actually effective in practice probably looks more like “look for promising people and recruit them manually, one-by-one”, instead of anything higher-leverage and higher-profile. One wonders whether the counterfactual timeline in which MIRI instead quietly focused on research and this more targeted recruiting is doing better than this one.
Possibly not. Possibly that primordial awareness-raising effort is going to provide the foundation for an international AGI-research ban. But I don’t think it’s clear that this was the better plan, in hindsight.
I agree in theory, but this has happened a lot. You’re arguing “the amount that should happen is non zero” which is uncontroversially true imo, but the hard question is “what fraction of the community’s effort should be spent on field building Vs directly doing things”—I think the answer is clearly not 100%, so the key Q becomes figuring out if it happens too much or too little right now. It sounds like you’re maybe not aware of a bunch of the field building work that has happened/is happening, I think you need to get a better sense on that in order to have an informed take of strategic errors or correct allocation of resources.
This style of thinking is roughly the right way to approach the problem. Although I also agree with John Wentworth’s comment that not enough people are directly trying to solve the problem.
You are getting fairly negative reactions because you appear to be unfamiliar with a lot of the existing ways in which people are already trying to win. For example, some people are now trying to influence government policy. But if you can think of a good alternative that people are not already doing, then you should suggest it.
Isn’t the most upvoted curated post right now about winning? A case for courage, when speaking of AI danger is talking about strategy, not technical research.
If you’re looking for people interested in personal strategies for individuals (e.g. earning to give), I think most of them are on the Effective Altruism Forum rather than LessWrong. The network effect means that everyone interested in a topic tend to cluster in one forum, even if they are given two choices initially.
Another speculative explanation, is that
maybe the upvote system allows the group of people interested in one particular topic (e.g. technical research, e.g. conceptual theorization) to upvote every post on that topic without running out of upvotes. This rewards people to repeatedly write posts on the most popular topics since it’s much easier to have net positive upvotes that way.
PS: I agree that earning to give is reasonable
I’m considering this myself right now :)
I mostly agree with you that hiring experts and having a great impact is feasible. Many of the technical alignment researchers who lament “money isn’t what we need, what we need is to be on the right direction instead of having so much fake research!” fail to realize that their own salaries are also coming from the flawed but nonetheless vital funding sources. If it wasn’t for the flawed funding sources, they would have nothing at all.
Some of them might be wealthy enough to fund themselves, but that’s effectively still making money to hire experts (the expert is themselves).
And yes, some people use AI safety careers as a stepping stone to AI capabilities careers. But realistically, the whole world spends less than $0.2 billion on AI safety and hundreds of billions on AI capabilities. AI safety salaries are negligible here. One might argue that the non-monetary moral motivation in working on AI safety, has caused people to end up working on AI capabilities. But in this case increasing AI safety salaries should reduce this throughput rather than increase it.
But Raemon is so right about the great danger of being a net negative. Don’t follow an “ends justify the means” strategy like Sam Bankman Fried, beware of your ego convincing you that AI is safer so long as you’re they guy in charge (like Sam Altman or Elon Musk). These biases are insidious, because we are machines programmed by evolution, not to seek truth for the sake of truth, but to
Arrive at the truth when it increases inclusive fitness
Arrive at beliefs which get us to do evil while honestly believing we are doing good (when it increases inclusive fitness)
Arrive at the said beliefs, despite wholly believing we are seeking the truth