Nina Panickssery comments on JDP Reviews IABIED

Nina Panickssery 19 Sep 2025 20:02 UTC
4 points
1
Okay but they’re not actually using those things as evidence for their claims about generalization in the limit
Of course, because those things themselves are the claims about generalization in the limit that require justification
which is explained through evolutionary metaphors
Evolutionary metaphors don’t constitute an argument, and also don’t reflect the authors’ tendency to update, seeing as they’ve been using evolutionary metaphors since the beginning
- Garrett Baker 19 Sep 2025 22:13 UTC
  6 points
  0
  Parent
  
  don’t reflect the authors’ tendency to update, seeing as they’ve been using evolutionary metaphors since the beginning
  
  This seems locally invalid. Eliezer at least has definitely used evolution in different ways and to make different points throughout the years. Originally using the “alien god” analogy to show optimization processes do not lead to niceness in general (in particular, no chaos or unpredictability required), now they use evolution for an “inner alignment is hard” analogy, mainly arguing it implies a big problem is that objective functions do not constrain generalization behavior enough to be useful for AGI alignment. Therefore the goals of your system will be very chaotic.
  
  I think this definitely constitutes an update, “inner alignment” concerns were not a thing in 2008.
  - Nina Panickssery 20 Sep 2025 4:46 UTC
    4 points
    0
    Parent
    I don’t see a big difference between
    optimization processes do not lead to niceness in general
    and
    objective functions do not constrain generalization behavior enough
    - Garrett Baker 20 Sep 2025 5:59 UTC
      2 points
      0
      Parent
      Its the difference between outer an inner alignment. The former makes the argument that it is possible, for some intelligent optimizer to be misaligned with humans, and likely for “alien gods” such as evolution or your proposed AGI. Its an argument about outer alignment not being trivial. It analogizes evolution to the AGI itself. Here is a typical example:
      Why is Nature cruel? You, a human, can look at an Ichneumon wasp, and decide that it’s cruel to eat your prey alive. You can decide that if you’re going to eat your prey alive, you can at least have the decency to stop it from hurting. It would scarcely cost the wasp anything to anesthetize its prey as well as paralyze it. Or what about old elephants, who die of starvation when their last set of teeth fall out? These elephants aren’t going to reproduce anyway. What would it cost evolution—the evolution of elephants, rather—to ensure that the elephant dies right away, instead of slowly and in agony? What would it cost evolution to anesthetize the elephant, or give it pleasant dreams before it dies? Nothing; that elephant won’t reproduce more or less either way.
      If you were talking to a fellow human, trying to resolve a conflict of interest, you would be in a good negotiating position—would have an easy job of persuasion. It would cost so little to anesthetize the prey, to let the elephant die without agony! Oh please, won’t you do it, kindly… um...
      There’s no one to argue with.
      Human beings fake their justifications, figure out what they want using one method, and then justify it using another method. There’s no Evolution of Elephants Fairy that’s trying to (a) figure out what’s best for elephants, and then (b) figure out how to justify it to the Evolutionary Overseer, who (c) doesn’t want to see reproductive fitness decreased, but is (d) willing to go along with the painless-death idea, so long as it doesn’t actually harm any genes.
      There’s no advocate for the elephants anywhere in the system.
      The latter analogizes evolution to the training process of you AGI. It doesn’t focus on the perfectly reasonable (for evolution) & optimal decisions your optimization criteria will make, it focuses on the staggering weirdness that happens to the organisms evolution creates outside their ancestral environment. Like humans’ taste for ice cream over “salted and honeyed raw bear fat”. This is not evolution coldly finding the most optimal genes for self-propagation, this is evolution going with the first “idea” it has which is marginally more fit in the ancestral environment, then ultimately, for no inclusive genetic fitness justified reason, creating AGIs which don’t care a lick about inclusive genetic fitness.
      That is, an iterative process which selects based on some criteria, and arrives at an AGI, need not also produce an AGI which itself optimizes that criteria outside the training/ancestral environment.
      - Nina Panickssery 20 Sep 2025 13:05 UTC
        6 points
        0
        Parent
        Fair, you’re right, I didn’t realize or forgot that the evolution analogy was previously used in the way it is in your pasted quote.