Imagine the set “the 50 people who would most helpfully contribute to technical alignment research, were they to be working on it, yet who are working on something else instead.” If you had to guess—if you had to make up a story which seems plausible—why are they working on something else instead? And what is it they’re working on?
[Question] Why are the people who could be doing safety research, but aren’t, doing something else?
A better question might be, those people who think (and are right) that they could most helpfully contribute to alignment research, and also think that it is the most important issue they could be working on, yet are doing something else, why are they doing something else? (And I doubt that with these caveats you will find 50 people like that.)
I don’t know where the premise is at, empirically. Who are the 50 best people in the world, and what are they doing?
Attempting to answer the question the author means to ask:
I’m not sure who the best 50 people would be. Perhaps they don’t know they are one of those 50, or they don’t know (how) they can make an impact on it (or they don’t know about it). (I know I don’t think I’m one of those 50 people...)
Plausibly a lot of them have something like Drexler’s or Hanson’s view, such that it doesn’t seem super-urgent & isn’t aligned with their comparative advantage.
I expect most members of the 50, by virtue of being on the list, do have some sort of relevant comparative advantage. But it seems plausible some of them don’t realize that.
Like, obviously I don’t mean the above straightforwardly, which kind of just dodges the question, but I think the underlying generator of it points towards something real. In particular, I think that most of human behavior is guided by habit and following other people’s examples. Very few humans are motivated by any form of explicit argument when it comes to their major life decisions and are instead primarily trying to stabilize their personal life, compete locally to get access to resources, and follow the example that other people around them have set and were socially rewarded for.
Concretely I think that humanity at large, in its choice of what it works on, should be modeled as an extremely sluggish system that tries to make minimal adjustments to its actions unless very strong forces compel it to (the industrial revolution was one such force, which did indeed reshape humanity’s everyday life much more than basically any event before it*).
So, most of the people I would like to be working on the important things are following deeply entrenched paths that only shift slowly, mostly driven by habits, local satisficing and social precedence.
I also have this model, and think it well-predicts lots of human behavior. But it doesn’t feel obvious to me that it also well-predicts the behavior of this 50, who I would expect to be unusually motivated by explicit arguments, unusually likely to gravitate toward the most interesting explicit arguments, etc.
I’ve been told, by people much smarter than me, and more connected to even smarter people, that the very elite, in terms of IQ have a sense of learned helplessness about the world.
According to this story, the smartest people in the world look around, and see stupidity all around them: the world is populated by, controlled by, such people who regularly make senseless decisions, and can’t even tell that they’re senseless. And it is obvious that trying to get people to understand is hopeless: aside from the fact that most of them basically can’t understand, you are small, and the world is huge.
So these people go and do math, and make a good life for themselves, and don’t worry about the world.
[I don’t know if this story is true.]
Strictly speaking, I agree with you. However, I want to emphasize that I disagree with the idea that this behavior is innate and that there’s just some people who happen to be non-conformist. In reality, most top mathematicians simply aren’t exposed to arguments for non-conformity and this explains the variance in non-conformity much more than an innate tendency to non-conform.
One natural category of answer, is that humans are scared of risking social stability. Alignment research is not an avenue that is old and safe within the institutions where most research-like things happen (universities), nor is it really an avenue at all. Most of the places where it happens are weird and new and not part of any establishment.
Then again, OpenAI and DeepMind are exciting-yet-fairly-safe places, and it’s not like the best mathematicians of our day are knocking down their doors looking for a prestigious job where they can also work on alignment. I guess for those people I do think that this type of optimisation is a somewhat alien thought process. They primarily pick topics that they find interesting, not important. It’s one thing to argue that they should be working in a field, it’s another thing to get them fascinated by it.
(I do think the Embedded Agency sequence is one of the best things that exists for building curiosity about bounded rationality, and am curious to hear of any good mathematicians/computer scientists/physicists who read it and what they feel about the problems contained therein.)
Somehow I hadn’t particularly grokked until just now (or maybe had forgotten?) the idea that “the thing to do here is make mathematicians think alignment is interesting”.
I think this is a good candidate answer, but I feel confused by (what seems to me like) the relative abundance of historical examples of optimization-type behavior among scientists during pivotal periods in the past. For example, during WWII there were some excellent scientists (e.g. Shannon) who only grudgingly pursued research that was “important” rather than “interesting.” But there were many others (e.g. Fermi, Szilard, Oppenheimer, Bethe, Teller, Von Neumann, Wigner) who seemed to grok the stakes. To be interested in some things mostly because of their importance, to ruthlessly prioritize, to actually try.
Epistemic status: Making guesses, though the conclusion feels right to me.
Is it the case that many of these people ‘actually tried’ before (a) the problem became primarily an engineering problem and (b) it was a crucial project during wartime and backed by the establishment?
This is probably a confused analogy, but I’ll say it anyway: if they didn’t, the analogy would be that in a world where there was war fought using advanced machine learning, there might be a similar set of top researchers doing research into adversarial examples, reward hacking and other concrete problems in ML—only doing things pretty closely related to actually building things, and only building things close to military applications.
This has many implications, but if you take MIRI’s view that new basic theory of embedded agency / grounding for optimisation is required, the other side of the analogy is that there was e.g. no team rushing to unify quantum mechanics and gravity during WWII.
So I suppose (to MIRI) the problem does seem more abstract and not built for human brains to reason about than the situation even with the atom bomb.
I think that if you think this, you should update hard against promoting the open problems as ‘important’ and instead focus on making the problem ‘interesting’.
(Would appreciate someone chiming in about whether (a) and (b) are accurate.)
I think nuclear physics then had more of an established paradigm than AI safety has now; from what I understand, building a bomb was considered a hard, unsolved problem, but one which it was broadly known how to solve. So I think the answer to A is basically “no.”
A bunch of people on the above list do seem to me to have actually tried before the project was backed by the establishment, though—from what I understand Fermi, Szilard, Wigner and Teller were responsible for getting the government involved in the first place. But their actions seem mostly to have been in the domains of politics, engineering and paradigmatic science, rather than new-branch-of-science-style theorizing.
(I do suspect it might be useful to find more ways of promoting the problem chiefly as interesting).
Speaking from my experience, my sense is indeed that people who think it’s important and interesting and who are resilient to social change have been able to make the leap to doing alignment research and been incredibly impactful, but that it should be a red flag when people think it’s important without the understanding/curiosity or social resilience. They can take up a lot of resources while falling into simple error modes.
Strongly agree. Awareness of this risk is, I think, the reason for some of CFAR’s actions that most-often confuse people—not teaching AI risk at intro workshops, not scaling massively, etc.
Example answers which strike me as plausible:
Most members of this set simply haven’t yet encountered one of the common attractors—LessWrong, CFAR, Superintelligence, HPMOR, 80k, etc. Perhaps this is because they don’t speak English, or because they’re sufficiently excited about their current research that they don’t often explore beyond it, or because they’re 16 and can’t psychologically justify doing things outside the category “prepare for college,” or because they’re finally about to get tenure and are actively trying to avoid getting nerd sniped by topics in other domains, or because they don’t have many friends so only get introduced to new topics they think to Google, or simply because despite being exactly the sort of person who would get nerd sniped by this problem if they’d ever encountered it they just… never have, not even the basic “maybe it will be a problem if we build machines smarter than us, huh?”, and maybe it shouldn’t be much more surprising that there might still exist pockets of extremely smart people who’ve never thought to wonder this than that there presumably existed pockets of smart people for millennia who never thought to wonder what effects might result from more successful organisms reproducing more?
Most members of this set have encountered one of the common attractors, or at least the basic ideas, but only in some poor and limited form that left them idea inoculated. Maybe they heard Kurzweil make a weirdly-specific claim once, or the advisor they really respect told them the whole field is pseudoscience that assumes AI will have human-like consciousness and drives to power, or they tried reading some of Eliezer’s posts and hated the writing style, or they felt sufficiently convinced by an argument for super-long timelines that investigating the issue more didn’t feel decision-relevant.
The question is ill-formed: perhaps because there just aren’t 50 people who could helpfully contribute who aren’t doing so already, or because the framing of the question implies the “50” is the relevant thing to track whereas actually research productivity is power law-ey and the vast majority of the benefit would come from finding just one or three particular members of this set and finding them would require asking different questions.
Discounting. There is no law of nature that can force me to care about preventing human extinction years from now, more than eating a tasty sandwich tomorrow. There is also no law that can force me to care about human extinction much more that about my own death.
There are, of course, more technical disagreements to be had. Reasonable people could question how bad unaligned AI will be or how much progress is possible in this research. But unlike those questions, the reasons of discounting are not debatable.
“Not debatable” seems a little strong. For example, one might suspect both that some rational humans disprefer persisting, and also that most who think this would change their minds upon further reflection.
While it’s true that preferences are not immutable, the things that change them are not usually debate. Sure, some people can be made to believe that their preferences are inconsistent, but then they will only make the smallest correction needed to fix the problem. Also, sometimes debate will make someone claim to have changed their preferences, just to that they can avoid social pressures (e.g. “how dare you not care about starving children!”), but this may not reflect in their actions.
Regardless, my claim is that many (or most) people discount a lot, and that this would be stable under reflection. Otherwise we’d see more charity, more investment and more work on e.g. climate change.