Mikhail Samin comments on Alignment Implications of LLM Successes: a Debate in One Act

Mikhail Samin 23 Oct 2023 1:49 UTC
7 points
−3
After skimming through this post, I thought the OP wouldn’t pass ITT for one of the perspectives he tried to represent in the post., so I was surprised to see it in my mailbox later. The contrast is between a more imaginary position and a less imaginary position. E.g., I would be happy to bet Yudkowsky or Soares, when asked, will say they wouldn’t be mentioning the fact that “capabilities generalize further than alignment” in the context it was mentioned in. We can discuss the odds if you’re potentially interested. I don’t think anyone even remotely close to understanding this fact would use it in that context.
- TurnTrout 23 Oct 2023 17:34 UTC
  7 points
  5
  Parent
  I don’t think anyone even remotely close to understanding this fact would use it in that context.
  I dispute this claim being called a “fact”, but I’m open to having my mind changed. Is there a clear explanation of this claim somewhere? I’ve read Lethalities and assorted bits of Nate’s writing.
  - Mikhail Samin 23 Oct 2023 22:55 UTC
    −4 points
    −2
    Parent
    I haven’t seen a clear explanation that I’d expect to change your mind. Some people understood a dynamic the claim refers to after reading the sharp left turn post (but most people I talked to didn’t, although I’m not sure whether they actually read it or just skimmed through it). If you haven’t read it, might be worth a try
- Zack_M_Davis 23 Oct 2023 2:30 UTC
  2 points
  0
  Parent
  Thanks for commenting. Is it any better if I delete that specific phrase, such that Doomimir’s line ends on “the results look like random garbage to you”? (Post edited.) I think there’s a legitimate point being made in that line (that the repetition behavior isn’t necessarily “wrong” “from the LLM’s perspective”, however wrong it looks to the user), even if it’s not the same point as “List of Lethalities” #21 (and the text I originally wrote was bad for suggesting that it was).