One natural category of answer, is that humans are scared of risking social stability. Alignment research is not an avenue that is old and safe within the institutions where most research-like things happen (universities), nor is it really an avenue at all. Most of the places where it happens are weird and new and not part of any establishment.
Then again, OpenAI and DeepMind are exciting-yet-fairly-safe places, and it’s not like the best mathematicians of our day are knocking down their doors looking for a prestigious job where they can also work on alignment. I guess for those people I do think that this type of optimisation is a somewhat alien thought process. They primarily pick topics that they find interesting, not important. It’s one thing to argue that they should be working in a field, it’s another thing to get them fascinated by it.
(I do think the Embedded Agency sequence is one of the best things that exists for building curiosity about bounded rationality, and am curious to hear of any good mathematicians/computer scientists/physicists who read it and what they feel about the problems contained therein.)
Somehow I hadn’t particularly grokked until just now (or maybe had forgotten?) the idea that “the thing to do here is make mathematicians think alignment is interesting”.
I think this is a good candidate answer, but I feel confused by (what seems to me like) the relative abundance of historical examples of optimization-type behavior among scientists during pivotal periods in the past. For example, during WWII there were some excellent scientists (e.g. Shannon) who only grudgingly pursued research that was “important” rather than “interesting.” But there were many others (e.g. Fermi, Szilard, Oppenheimer, Bethe, Teller, Von Neumann, Wigner) who seemed to grok the stakes. To be interested in some things mostly because of their importance, to ruthlessly prioritize, to actually try.
Epistemic status: Making guesses, though the conclusion feels right to me.
But there were many others (e.g. Fermi, Szilard, Oppenheimer, Bethe, Teller, Von Neumann, Wigner) who seemed… to truly grok the stakes.
Is it the case that many of these people ‘actually tried’ before (a) the problem became primarily an engineering problem and (b) it was a crucial project during wartime and backed by the establishment?
This is probably a confused analogy, but I’ll say it anyway: if they didn’t, the analogy would be that in a world where there was war fought using advanced machine learning, there might be a similar set of top researchers doing research into adversarial examples, reward hacking and other concrete problems in ML—only doing things pretty closely related to actually building things, and only building things close to military applications.
This has many implications, but if you take MIRI’s view that new basic theory of embedded agency / grounding for optimisation is required, the other side of the analogy is that there was e.g. no team rushing to unify quantum mechanics and gravity during WWII.
So I suppose (to MIRI) the problem does seem more abstract and not built for human brains to reason about than the situation even with the atom bomb.
I think that if you think this, you should update hard against promoting the open problems as ‘important’ and instead focus on making the problem ‘interesting’.
(Would appreciate someone chiming in about whether (a) and (b) are accurate.)
I think nuclear physics then had more of an established paradigm than AI safety has now; from what I understand, building a bomb was considered a hard, unsolved problem, but one which it was broadly known how to solve. So I think the answer to A is basically “no.”
A bunch of people on the above list do seem to me to have actually tried before the project was backed by the establishment, though—from what I understand Fermi, Szilard, Wigner and Teller were responsible for getting the government involved in the first place. But their actions seem mostly to have been in the domains of politics, engineering and paradigmatic science, rather than new-branch-of-science-style theorizing.
(I do suspect it might be useful to find more ways of promoting the problem chiefly as interesting).
Speaking from my experience, my sense is indeed that people who think it’s important and interesting and who are resilient to social change have been able to make the leap to doing alignment research and been incredibly impactful, but that it should be a red flag when people think it’s important without the understanding/curiosity or social resilience. They can take up a lot of resources while falling into simple error modes.
Strongly agree. Awareness of this risk is, I think, the reason for some of CFAR’s actions that most-often confuse people—not teaching AI risk at intro workshops, not scaling massively, etc.
One natural category of answer, is that humans are scared of risking social stability. Alignment research is not an avenue that is old and safe within the institutions where most research-like things happen (universities), nor is it really an avenue at all. Most of the places where it happens are weird and new and not part of any establishment.
Then again, OpenAI and DeepMind are exciting-yet-fairly-safe places, and it’s not like the best mathematicians of our day are knocking down their doors looking for a prestigious job where they can also work on alignment. I guess for those people I do think that this type of optimisation is a somewhat alien thought process. They primarily pick topics that they find interesting, not important. It’s one thing to argue that they should be working in a field, it’s another thing to get them fascinated by it.
(I do think the Embedded Agency sequence is one of the best things that exists for building curiosity about bounded rationality, and am curious to hear of any good mathematicians/computer scientists/physicists who read it and what they feel about the problems contained therein.)
Somehow I hadn’t particularly grokked until just now (or maybe had forgotten?) the idea that “the thing to do here is make mathematicians think alignment is interesting”.
I think this is a good candidate answer, but I feel confused by (what seems to me like) the relative abundance of historical examples of optimization-type behavior among scientists during pivotal periods in the past. For example, during WWII there were some excellent scientists (e.g. Shannon) who only grudgingly pursued research that was “important” rather than “interesting.” But there were many others (e.g. Fermi, Szilard, Oppenheimer, Bethe, Teller, Von Neumann, Wigner) who seemed to grok the stakes. To be interested in some things mostly because of their importance, to ruthlessly prioritize, to actually try.
Epistemic status: Making guesses, though the conclusion feels right to me.
Is it the case that many of these people ‘actually tried’ before (a) the problem became primarily an engineering problem and (b) it was a crucial project during wartime and backed by the establishment?
This is probably a confused analogy, but I’ll say it anyway: if they didn’t, the analogy would be that in a world where there was war fought using advanced machine learning, there might be a similar set of top researchers doing research into adversarial examples, reward hacking and other concrete problems in ML—only doing things pretty closely related to actually building things, and only building things close to military applications.
This has many implications, but if you take MIRI’s view that new basic theory of embedded agency / grounding for optimisation is required, the other side of the analogy is that there was e.g. no team rushing to unify quantum mechanics and gravity during WWII.
So I suppose (to MIRI) the problem does seem more abstract and not built for human brains to reason about than the situation even with the atom bomb.
I think that if you think this, you should update hard against promoting the open problems as ‘important’ and instead focus on making the problem ‘interesting’.
(Would appreciate someone chiming in about whether (a) and (b) are accurate.)
I think nuclear physics then had more of an established paradigm than AI safety has now; from what I understand, building a bomb was considered a hard, unsolved problem, but one which it was broadly known how to solve. So I think the answer to A is basically “no.”
A bunch of people on the above list do seem to me to have actually tried before the project was backed by the establishment, though—from what I understand Fermi, Szilard, Wigner and Teller were responsible for getting the government involved in the first place. But their actions seem mostly to have been in the domains of politics, engineering and paradigmatic science, rather than new-branch-of-science-style theorizing.
(I do suspect it might be useful to find more ways of promoting the problem chiefly as interesting).
Speaking from my experience, my sense is indeed that people who think it’s important and interesting and who are resilient to social change have been able to make the leap to doing alignment research and been incredibly impactful, but that it should be a red flag when people think it’s important without the understanding/curiosity or social resilience. They can take up a lot of resources while falling into simple error modes.
Strongly agree. Awareness of this risk is, I think, the reason for some of CFAR’s actions that most-often confuse people—not teaching AI risk at intro workshops, not scaling massively, etc.