johnswentworth comments on Discussion with Eliezer Yudkowsky on AGI interventions

johnswentworth 12 Nov 2021 16:03 UTC
LW: 56 AF: 26
AF
… I find that most people working on alignment are trying far harder harder to justify why they expect their work to matter than EY and the old-school MIRI team ever did.
You’ve had a few comments along these lines in this thread, and I think this is where you’re most severely failing to see the situation from Yudkowsky’s point of view.
From Yudkowsky’s view, explaining and justifying MIRI’s work (and the processes he uses to reach such judgements more generally) was the main point of the sequences. He has written more on the topic than anyone else in the world, by a wide margin. He basically spent several years full-time just trying to get everyone up to speed, because the inductive gap was very very wide.
When I put on my Yudkowsky hat and look at both the OP and your comments through that lens… I imagine if I were Yudkowsky I’d feel pretty exasperated at this point. Like, he’s written a massive volume on the topic, and now ten years later a large chunk of people haven’t even bothered to read it. (In particular, I know (because it’s come up in conversation) that at least a few of the people who talk about prosaic alignment a lot haven’t read the sequences, and I suspect that a disproportionate number haven’t. I don’t mean to point fingers or cast blame here, the sequences are a lot of material and most of it is not legibly relevant before reading it all, but if you haven’t read the sequences and you’re wondering why MIRI doesn’t have a write-up on why they’re not excited about prosaic alignment… well, that’s kinda the write-up. Also I feel like I need a disclaimer here that many people excited about prosaic alignment have read the sequences, I definitely don’t mean to imply that this is everyone in the category.)
(To be clear, I don’t think the sequences explain all of the pieces behind Yudkowsky’s views of prosaic alignment, in depth. They were written for a different use-case. But I do think they explain a lot.)
Related: IMO the best roughly-up-to-date piece explaining the Yudkowsky/MIRI viewpoint is The Rocket Alignment Problem.
What links here?
- adamShimi's comment on Discussion with Eliezer Yudkowsky on AGI interventions by Rob Bensinger (15 Nov 2021 14:52 UTC; 121 points)
- adamShimi 12 Nov 2021 16:24 UTC
  LW: 7 AF: 5
  AF Parent
  Thanks for the pushback!
  You’ve had a few comments along these lines in this thread, and I think this is where you’re most severely failing to see the situation from Yudkowsky’s point of view.
  From Yudkowsky’s view, explaining and justifying MIRI’s work (and the processes he uses to reach such judgements more generally) was the main point of the sequences. He has written more on the topic than anyone else in the world, by a wide margin. He basically spent several years full-time just trying to get everyone up to speed, because the inductive gap was very very wide.
  My memory of the sequences is that it’s far more about defending and explaining the alignment problem than criticizing prosaic AGI (maybe because the term couldn’t have been used years before Paul coined it?). Could you give me the best pointers of prosaic Alignment criticism in the sequence? I(I’ve read the sequences, but I don’t remember every single post, and my impression for memory is what I’ve written above).
  I feel also that there might be a discrepancy between who I think of when I think of prosaic alignment researchers and what the category means in general/to most people here? My category mostly includes AF posters, people from a bunch of places like EleutherAI/OpenAI/DeepMind/Anthropic/Redwood and people from CHAI and FHI. I expect most of these people to actually have read the sequences, and tried to understand MIRI’s perspective. Maybe someone could point out a list of other places where prosaic alignment research is being done that I’m missing, especially places where people probably haven’t read the sequences? Or maybe I’m over estimating how many of the people in the places I mentioned have read the sequences?
  - johnswentworth 12 Nov 2021 16:51 UTC
    LW: 53 AF: 26
    AF Parent
    I don’t mean to say that there’s critique of prosaic alignment specifically in the sequences. Rather, a lot of the generators of the Yudkowsky-esque worldview are in there. (That is how the sequences work: it’s not about arguing specific ideas around alignment, it’s about explaining enough of the background frames and generators that the argument becomes unnecessary. “Raise the sanity waterline” and all that.)
    For instance, just the other day I ran across this:
    Of this I learn the lesson: You cannot manipulate confusion. You cannot make clever plans to work around the holes in your understanding. You can’t even make “best guesses” about things which fundamentally confuse you, and relate them to other confusing things. Well, you can, but you won’t get it right, until your confusion dissolves. Confusion exists in the mind, not in the reality, and trying to treat it like something you can pick up and move around, will only result in unintentional comedy.
    Similarly, you cannot come up with clever reasons why the gaps in your model don’t matter. You cannot draw a border around the mystery, put on neat handles that let you use the Mysterious Thing without really understanding it—like my attempt to make the possibility that life is meaningless cancel out of an expected utility formula. You can’t pick up the gap and manipulate it.
    If the blank spot on your map conceals a land mine, then putting your weight down on that spot will be fatal, no matter how good your excuse for not knowing. Any black box could contain a trap, and there’s no way to know except opening up the black box and looking inside. If you come up with some righteous justification for why you need to rush on ahead with the best understanding you have—the trap goes off.
    (The earlier part of the post had a couple embarrassing stories of mistakes Yudkowsky made earlier, which is where the lesson came from.) Reading that, I was like, “man that sure does sound like the Yudkowsky-esque viewpoint on prosaic alignment”.
    Or maybe I’m over estimating how many of the people in the places I mentioned have read the sequences?
    I think you are overestimating. At the orgs you list, I’d guess at least 25% and probably more than half have not read the sequences. (Low confidence/wide error bars, though.)
    What links here?
    adamShimi's comment on Discussion with Eliezer Yudkowsky on AGI interventions by Rob Bensinger (15 Nov 2021 14:52 UTC; 121 points)