Another semi-assumption this makes is that the instinct of most normies (by which I mean people neither working on capabilities nor safety) is to, when they hear about this issue, try their hand at alignment.
In my experience, this just isn’t the case. If I manage to convey there’s a huge problem at all, they also realize that alignment is an extremely dangerous game and figure they’re not going to be successful. They might ask if they can fetch someone at a MIRI-equivalent coffee or do their generalist programming work, because people do have an instinct to help, but they explicitly stay away from doing critical safety research. What they generally want to do instead is try to tell their friends and family about the problem, and help slow down the field, which he mentions in B. You might get a different set of responses if you’re constantly talking to bright young mathematicians who believe their comparative advantage is doing math, but I’ve been pretty careful to check the behavior of the people I’m doing outreach for to make sure they’re not enhancing the problem.
And there’s a difference between the kinds of “safety” work that dilutes the quality of alignment research if not done exceptionally well and the kinds of “safety” work that involve regulating, lobbying, or slowing down existing AI companies. Barring some bizarre second-order effects that have not been coherently argued for on LW, I think more people pressuring large tech companies to make their work “safer” is a good thing, and very obviously a good thing. If the normies succeed in actually pushing DeepMind & crew toward operational adequacy, fantastic! If they don’t, well, at least those teams are wasting money/time operationally on something else besides ending the world, and when money/time has been allocated inefficiently towards a problem it’s still generally (though not always) easier to reform existing efforts than start from scratch.
They might ask if they can fetch someone at a MIRI-equivalent coffee or do their generalist programming work, because people do have an instinct to help, but they explicitly stay away from doing critical safety research.
Another semi-assumption this makes is that the instinct of most normies (by which I mean people neither working on capabilities nor safety) is to, when they hear about this issue, try their hand at alignment.
In my experience, this just isn’t the case. If I manage to convey there’s a huge problem at all, they also realize that alignment is an extremely dangerous game and figure they’re not going to be successful. They might ask if they can fetch someone at a MIRI-equivalent coffee or do their generalist programming work, because people do have an instinct to help, but they explicitly stay away from doing critical safety research. What they generally want to do instead is try to tell their friends and family about the problem, and help slow down the field, which he mentions in B. You might get a different set of responses if you’re constantly talking to bright young mathematicians who believe their comparative advantage is doing math, but I’ve been pretty careful to check the behavior of the people I’m doing outreach for to make sure they’re not enhancing the problem.
And there’s a difference between the kinds of “safety” work that dilutes the quality of alignment research if not done exceptionally well and the kinds of “safety” work that involve regulating, lobbying, or slowing down existing AI companies. Barring some bizarre second-order effects that have not been coherently argued for on LW, I think more people pressuring large tech companies to make their work “safer” is a good thing, and very obviously a good thing. If the normies succeed in actually pushing DeepMind & crew toward operational adequacy, fantastic! If they don’t, well, at least those teams are wasting money/time operationally on something else besides ending the world, and when money/time has been allocated inefficiently towards a problem it’s still generally (though not always) easier to reform existing efforts than start from scratch.
This is me currently!