Steven Byrnes comments on I Would Have Solved Alignment, But I Was Worried That Would Advance Timelines

Steven Byrnes 20 Oct 2023 18:44 UTC
41 points
4
I found the tone of this post annoying at times, especially for overgeneralizing (“the alignment community” is not monolithic) and exaggerating / mocking (e.g. the title). But I’m trying to look past that to the substance, of which there’s plenty :)
I think everyone agrees that you should weigh costs and benefits in any important decision. And everyone agrees that the costs and benefits may be different in different specific circumstances. At least, I sure hope everyone agrees with that! Also, everyone is in favor of accurate knowledge of costs and benefits. I will try to restate the points you’re making, without the sass. But these are my own opinions now. (These are all pro tanto, i.e. able to be outweighed by other considerations.)
- If fewer people who care about AI risk join or found leading AI companies, then there will still be leading AI companies in the world, it’s just that a smaller fraction of their staff will care about AI risk than otherwise. (Cf. entryism.) (On the other hand, presumably those companies would make less progress on the margin.)
  - And one possible consequence of “smaller fraction of their staff will care about AI risk” is less hiring of alignment researchers into these companies. Insofar as working with cutting-edge models and knowhow is important for alignment research (a controversial topic, where I’m on the skeptical side), and insofar as philanthropists and others aren’t adequately funding alignment research (unfortunately true), then it’s especially great if these companies support lots of high-quality alignment research.
- If fewer people who care about AI risk go about acquiring a reputation as a prestigious AI researcher, or acquiring AI credentials like PhDs then there will still be prestigious and credentialed AI researchers in the world, it’s just that a smaller fraction of them will care about AI risk than otherwise. (On the other hand, presumably those prestigious researchers would collectively make less progress on the margin.)
  - This and the previous bullet point are relevant to public perceptions, outreach, legislation efforts, etc.
- MIRI has long kept some of their ideas secret, and at least some alignment people think that some MIRI people are “overly cautious with infosec”. Presumably at least some other people disagree, or they wouldn’t have that policy. I don’t know the secrets, so I am at a loss to figure out who’s in the right here. Incidentally, the OP implies that the stuff MIRI is choosing not to publish is “alignment research”, which I think is at least not obvious. One presumes that the ideas might bear on alignment research, or else they wouldn’t be thinking about it, but I think that’s a weaker statement, at least for the way I define “alignment research”.
- If “the first AGI and later ASI will be built with utmost caution by people who take AI risk very seriously”, then that’s sure a whole lot better than the alternative, and probably necessary for survival, but experts strongly disagree about whether it’s sufficient for survival.
  - (The OP also suggests that this is the course we’re on. Well that depends on what “utmost caution” means. One can imagine companies being much less cautious than they have been so far, but also, one can imagine companies being much more cautious than they have been so far. E.g. I’d be very surprised if OpenAI etc. lives up to these standards, and forget about these standards. Regardless, we can all agree that more caution is better than less.)
- A couple alignment people have commented that interpretability research can have problematic capabilities externalities, although neither was saying that all of it should halt right now or anything like that.
  - (For my part, I think interpretability researchers should weight costs and benefits of doing the research, and also the costs and benefits of publishing the results, on a case-by-case basis, just like everyone else should.)
- The more that alignment researchers freely share information, the better and faster they will produce alignment research.
- If alignment researchers are publishing some stuff, but not publishing other stuff, then that’s not necessarily good enough, because maybe the latter stuff is essential for alignment.
- If fewer people who care about AI risk become VCs who invest in AI companies, or join AI startups or AI discord communities or whatever, then “the presence of AI risk in all of these spaces will diminish”.
I think some of these are more important and some are less, but all are real. I just think it’s extraordinarily important to be doing things on a case-by-case basis here. Like, let’s say I want to work at OpenAI, with the idea that I’m going to advocate for safety-promoting causes, and take actions that are minimally bad for timelines. OK, now I’ve been at OpenAI for a little while. How’s it going so far? What exactly am I working on? Am I actually advocating for the things I was hoping to advocate for? What are my prospects going forward? Am I being indoctrinated and socially pressured in ways that I don’t endorse? (Or am I indoctrinating and socially pressuring others? How?) Etc. Or: I’m a VC investing in AI companies. What’s the counterfactual if I wasn’t here? Do I actually have any influence over the companies I’m funding, and if so, what am I doing with that influence, now and in the future?
Anyway, one big thing that I see as missing from the post is the idea:
“X is an interesting AI idea, and it’s big if true, and more importantly, if it’s true, then people will necessarily discover X in the course of building AGI. OK, guess I won’t publish it. After all, if it’s true, then someone else will publish it sooner or later. And then—and only then—I can pick up this strand of research that depends on X and talk about it more freely. Meanwhile maybe I’ll keep thinking about it but not publish it.”
In those kinds of cases, not publishing is a clear win IMO. More discussion at “Endgame safety” for AGI. For my part, I say that kind of thing all the time.
And I publish some stuff where I think the benefits of publishing outweigh the costs, and don’t publish other stuff where I think they don’t, on a case-by-case basis, and it’s super annoying and stressful and I never know (and never will know) if I’m making the right decisions, but I think it’s far superior to blanket policies.
- 307th 21 Oct 2023 9:01 UTC
  5 points
  4
  Parent
  > I just think it’s extraordinarily important to be doing things on a case-by-case basis here. Like, let’s say I want to work at OpenAI, with the idea that I’m going to advocate for safety-promoting causes, and take actions that are minimally bad for timelines.
  
  Notice that this is phrasing AI safety and AI timelines as two equal concerns that are worth trading off against each other. I don’t think they are equal, and I think most people would have far better impact if they completely struck “I’m worried this will advance timelines” from their thinking and instead focused solely on “how can I make AI risk better”.
  
  I considered talking about why I think this is the case psychologically, but for the piece I felt it was more productive to focus on the object level arguments for why the tradeoffs people are making are bad. But to go into the psychological component a bit:
  
  -Loss aversion: The fear of making AI risk worse is greater than the joy of making it better.
  
  -Status quo bias: Doing something, especially something like working on AI capabilities, is seen as giving you responsibility for the problem. We see this with rhetoric against AGI labs—many in the alignment community will level terrible accusations against them, all while having to admit when pressed that it is plausible they are making AI risk better.
  
  -Fear undermining probability estimates: I don’t know if there’s a catchy phrase for this one but I think it’s real. The impacts of any actions you take will be very muddy, indirect, and uncertain, because this is a big, long term problem. When you are afraid, this makes you view uncertain positive impacts with suspicion and makes you see uncertain negative impacts as more likely. So people doubt tenuous contributions to AI safety like “AI capability researchers worried about AI risk lend credibility to the problem, thereby making AI risk better”, but view tenuous contributions to AI risk like “you publish a capabilities paper, thereby speeding up timelines, making AI risk worse” as plausible.
  - Steven Byrnes 21 Oct 2023 12:14 UTC
    6 points
    2
    Parent
    Notice that this is phrasing AI safety and AI timelines as two equal concerns that are worth trading off against each other. I don’t think they are equal, and I think most people would have far better impact if they completely struck “I’m worried this will advance timelines” from their thinking and instead focused solely on “how can I make AI risk better”.
    This seems confused in many respects. AI safety is the thing I care about. I think AI timelines are a factor contributing to AI safety, via having more time to do AI safety technical research, and maybe also other things like getting better AI-related governance and institutions. You’re welcome to argue that shorter AI timelines other things equal do not make safe & beneficial AGI less likely—i.e., you can argue for: “Shortening AI timelines should be excluded from cost-benefit analysis because it is not a cost in the first place.” Some people believe that, although I happen to strongly disagree. Is that what you believe? If so, I’m confused. You should have just said it directly. It would make almost everything in this OP besides the point, right? I understood this OP to be taking the perspective that shortening AI timelines is bad, but the benefits of doing so greatly outweigh the costs, and the OP is mainly listing out various benefits of being willing to shorten timelines.
    Putting that aside, “two equal concerns” is a strange phrasing. The whole idea of cost-benefit analysis is that the costs and benefits are generally not equal, and we’re trying to figure out which one is bigger (in the context of the decision in question).
    If someone thinks that shortening AI timelines is bad, then I think they shouldn’t and won’t ignore that. If they estimate that, in a particular decision, they’re shortening AI timelines infinitesimally, in exchange for a much larger benefit, then they shouldn’t ignore that either. I think “shortening AI timelines is bad but you should completely ignore that fact in all your actions” is a really bad plan. Not all timeline-shortening actions have infinitesimal consequence, and not all associated safety benefits are much larger. In some cases it’s the other way around—massive timeline-shortening for infinitesimal benefit. You won’t know which it is in a particular circumstance if you declare a priori that you’re not going to think about it in the first place.
    …psychologically…
    I think another “psychological” factor is a deontological / Hippocratic Oath / virtue kind of thing: “first, do no harm”. Somewhat relatedly, it can come across as hypocritical if someone is building AGI on weekdays and publicly advocating for everyone to stop building AGI on weekends. (I’m not agreeing or disagreeing with this paragraph, just stating an observation.)
    We see this with rhetoric against AGI labs—many in the alignment community will level terrible accusations against them, all while having to admit when pressed that it is plausible they are making AI risk better.
    I think you’re confused about the perspective that you’re trying to argue against. Lots of people are very confident, including “when pressed”, that we’d probably be in a much better place right now if the big AGI labs (especially OpenAI) had never been founded. You can disagree, but you shouldn’t put words in people’s mouths.
    - 307th 21 Oct 2023 14:05 UTC
      1 point
      0
      Parent
      The focus of the piece is on the cost of various methods taken to slow down AI timelines, with the thesis being that across a wide variety of different beliefs about the merit of slowing down AI, these costs aren’t worth it. I don’t think it’s confused to be agnostic about the merits of slowing down AI when the tradeoffs being taken are this bad.
      
      Views on the merit of slowing down AI will be highly variable from person to person and will depend on a lot of extremely difficult and debatable premises that are nevertheless easy to have an opinion on. There is a place for debating all of those various premises and trying to nail down what exactly the benefit is of slowing down AI; but there is also a place for saying “hey, stop getting pulled in by that bike-shed and notice how these tradeoffs being taken are not worth it given pretty much any view on the benefit of slowing down AI”.
      
      > I think you’re confused about the perspective that you’re trying to argue against. Lots of people are very confident, including “when pressed”, that we’d probably be in a much better place right now if the big AGI labs (especially OpenAI) had never been founded. You can disagree, but you shouldn’t put words in people’s mouths.
      
      I was speaking from experience, having seen this dynamic play out multiple times. But yes, I’m aware that others are extremely confident in all kinds of specific and shaky premises.