Nina Panickssery comments on JDP Reviews IABIED

Nina Panickssery 19 Sep 2025 15:56 UTC
18 points
3
This review really misses the mark I think.
The word “paperclip” does not appear anywhere in the book
The word “mesaoptimizer” does not appear anywhere in the book
Sure, but the same arguments are being made in different words. I agree that avoiding rationalist jargon makes it a better read for laypeople, but it doesn’t change the validity of the argument or the extent to which it reflects newer evidence. The book is about a deceptive mesaoptimizer that relentlessly steers the world towards a target as meaningless to us as paperclips, at its core.
In general the book moves somewhat away from abstraction and comments more on the empirical strangeness of AI
The way in which it comments on the “empirical strangeness of AI” is very biased. For instance, it fails to mention the many ways in which today’s rather general AIs don’t engage in weird, maximizing behavior or pursue unpredictable goals. Instead it mentions a few cases where AI systems did things we didn’t expect, like glitch tokens, which is incredibly weak empirical evidence for their claims.
- jdp 19 Sep 2025 18:06 UTC
  7 points
  1
  Parent
  Okay but they’re not actually using those things as evidence for their claims about generalization in the limit, which is explained through evolutionary metaphors. I agree that the argument itself is not very well explained but if you can’t see the ways that a MCTS searching over paths to an outcome where the policy has complications like glitch tokens could lead to bad outcomes I’m not really sure what to tell you. Like, if your policy thinks a weird string is the highest scoring thing (a category of error you absolutely see in real reward models) then that’s going to distort any search process that uses it as a policy. So if you just assume ASI is a normal AI agent with a policy and a planner (not an insane assumption) and it has things like glitch tokens you’re likely in for a bad time.
  
  I was giving an inside baseball review for the sort of person who has been following this for a while and wants to know if EY updated at all. And the answer is yeah he threw out a lot of the dumbest rhetoric.
  
  “Okay but is the book good?”
  
  Oh hell no.
  - Nina Panickssery 19 Sep 2025 20:02 UTC
    4 points
    1
    Parent
    Okay but they’re not actually using those things as evidence for their claims about generalization in the limit
    Of course, because those things themselves are the claims about generalization in the limit that require justification
    which is explained through evolutionary metaphors
    Evolutionary metaphors don’t constitute an argument, and also don’t reflect the authors’ tendency to update, seeing as they’ve been using evolutionary metaphors since the beginning
    - Garrett Baker 19 Sep 2025 22:13 UTC
      6 points
      0
      Parent
      
      don’t reflect the authors’ tendency to update, seeing as they’ve been using evolutionary metaphors since the beginning
      
      This seems locally invalid. Eliezer at least has definitely used evolution in different ways and to make different points throughout the years. Originally using the “alien god” analogy to show optimization processes do not lead to niceness in general (in particular, no chaos or unpredictability required), now they use evolution for an “inner alignment is hard” analogy, mainly arguing it implies a big problem is that objective functions do not constrain generalization behavior enough to be useful for AGI alignment. Therefore the goals of your system will be very chaotic.
      
      I think this definitely constitutes an update, “inner alignment” concerns were not a thing in 2008.
      - Nina Panickssery 20 Sep 2025 4:46 UTC
        4 points
        0
        Parent
        I don’t see a big difference between
        optimization processes do not lead to niceness in general
        and
        objective functions do not constrain generalization behavior enough
        Garrett Baker 20 Sep 2025 5:59 UTC
        2 points
        0
        Parent
        Its the difference between outer an inner alignment. The former makes the argument that it is possible, for some intelligent optimizer to be misaligned with humans, and likely for “alien gods” such as evolution or your proposed AGI. Its an argument about outer alignment not being trivial. It analogizes evolution to the AGI itself. Here is a typical example:
        Why is Nature cruel? You, a human, can look at an Ichneumon wasp, and decide that it’s cruel to eat your prey alive. You can decide that if you’re going to eat your prey alive, you can at least have the decency to stop it from hurting. It would scarcely cost the wasp anything to anesthetize its prey as well as paralyze it. Or what about old elephants, who die of starvation when their last set of teeth fall out? These elephants aren’t going to reproduce anyway. What would it cost evolution—the evolution of elephants, rather—to ensure that the elephant dies right away, instead of slowly and in agony? What would it cost evolution to anesthetize the elephant, or give it pleasant dreams before it dies? Nothing; that elephant won’t reproduce more or less either way.
        If you were talking to a fellow human, trying to resolve a conflict of interest, you would be in a good negotiating position—would have an easy job of persuasion. It would cost so little to anesthetize the prey, to let the elephant die without agony! Oh please, won’t you do it, kindly… um...
        There’s no one to argue with.
        Human beings fake their justifications, figure out what they want using one method, and then justify it using another method. There’s no Evolution of Elephants Fairy that’s trying to (a) figure out what’s best for elephants, and then (b) figure out how to justify it to the Evolutionary Overseer, who (c) doesn’t want to see reproductive fitness decreased, but is (d) willing to go along with the painless-death idea, so long as it doesn’t actually harm any genes.
        There’s no advocate for the elephants anywhere in the system.
        The latter analogizes evolution to the training process of you AGI. It doesn’t focus on the perfectly reasonable (for evolution) & optimal decisions your optimization criteria will make, it focuses on the staggering weirdness that happens to the organisms evolution creates outside their ancestral environment. Like humans’ taste for ice cream over “salted and honeyed raw bear fat”. This is not evolution coldly finding the most optimal genes for self-propagation, this is evolution going with the first “idea” it has which is marginally more fit in the ancestral environment, then ultimately, for no inclusive genetic fitness justified reason, creating AGIs which don’t care a lick about inclusive genetic fitness.
        That is, an iterative process which selects based on some criteria, and arrives at an AGI, need not also produce an AGI which itself optimizes that criteria outside the training/ancestral environment.
        Nina Panickssery 20 Sep 2025 13:05 UTC
        6 points
        0
        Parent
        Fair, you’re right, I didn’t realize or forgot that the evolution analogy was previously used in the way it is in your pasted quote.