gwern comments on Genetic fitness is a measure of selection strength, not the selection target

gwern 4 Nov 2023 21:52 UTC
LW: 16 AF: 2
3
AF
This seems to be making the same sort of deepity that Turntrout is making in his ‘reward is not the optimization target’, in taking a minor point about model-free RL approaches not necessarily building in any explicit optimization/planning for reward into their policy, and then people not understanding it because it ducks the major issue, while handwaving a lot of points. (Especially bad: infanticide is not a substitute for contraception because pregnancy is outrageously fatal and metabolically expensive, which is precisely why the introduction of contraception has huge effects everywhere it happens and why hunter-foragers have so many kids while contemporary women have fewer than they want to. Infanticide is just about the worst possible form of contraception short of the woman dying. I trust you would not argue that ‘suicide is just as effective contraceptive as infanticide or condoms’ using the same logic—after all, if the mother is dead, then there’s definitely no more kids...)

In particular, this fundamentally does not answer the challenge I posed earlier by pointing to instances of sperm bank donors who quite routinely rack up hundreds of offspring, while being in no way special other than having a highly-atypical urge to have lots of offspring. You can check this out very easily in seconds and verify that you could do the same thing with less effort than you’ve probably put into some video games. And yet, you continue to read this comment. Here, look, you’re still reading it. Seconds are ticking away while you continue to forfeit (I will be generous and pretend that a LWer is likely to have median number of kids) much more than 10,000% more fitness at next to no cost of any kind. And you know this because you are a model-based RL agent who can plan and predict the consequences of actions based solely on observations (like of text comments) without any additional rewards, you don’t have to wait for model-free mechanisms like evolution to slowly update your policy over countless rewards. You are perfectly able to predict that if the status quo lasted for enough millennia, this would stop being true; men would gradually be born with a baby-lust, and would flock to sperm donation banks (assuming such things even still existed under the escalating pressure); you know what the process of evolution would do and is doing right now very slowly, and yet, using your evolution-given brain, you still refuse to reap the fitness rewards of hundreds of offspring right now, in your generation, with yourself, for your genes. How is this not an excellent example of how under novel circumstances, inner-optimizers (like human brains) can almost all (serial sperm donor cases like hundreds out of billions) diverge extremely far (if forfeiting >10,000% is not diverging far, what would be?) from the optimization process’s reward function (within-generation increase in allele frequencies), while pursuing other rewards (whatever it is you are enjoying doing while very busy not ever donating sperm)? Certainly if AGI were as well-aligned with human values as we are with inclusive fitness, that doesn’t seem to bode very well for how human values will be fulfilled over time as the AGI-environment changes ever more rapidly & at scale—I don’t know what the ‘masturbation, porn, or condom of human values’ is, and I’d rather not find out empirically how diabolically clever reward hacks can be when found by superhuman optimization processes at scale targeting the original human values process...
What links here?
- rotatingpaguro's comment on Genetic fitness is a measure of selection strength, not the selection target by Kaj_Sotala (5 Nov 2023 2:08 UTC; 1 point)
- Nora Belrose 5 Nov 2023 5:25 UTC
  13 points
  −1
  Parent
  This seems to entirely ignore the actual point that is being made in the post. The point is that “IGF” is not a stable and contentful loss function, it is a misleadingly simple shorthand for “whatever traits are increasing their own frequency at the moment.” Once you see this, you notice two things:
  1. In some weak sense, we are fairly well “aligned” to the “traits” that were selected for in the ancestral environment, in particular our social instincts.
  2. All of the ways in which ML is disanalogous with evolution indicate that alignment will be dramatically easier and better for ML models. For starters, we don’t randomly change the objective function for ML models throughout training. See Quintin’s post for many more disanalogies.
  - quetzal_rainbow 5 Nov 2023 9:19 UTC
    9 points
    6
    Parent
    The main problem I have with this type of reasoning is an arbitrary drawn ontological boundaries. Why IGF is “not real” and ML objective function is “real”, while if we really zoom in training process, the verifiable in positivist brutal way real training goal is “whatever direction in coefficient space loss function decreases on current batch of data” which seems to me pretty corresponding to “whatever traits are spreading in current environment”?
- Kaj_Sotala 4 Nov 2023 22:54 UTC
  LW: 12 AF: 6
  1
  AF Parent
  infanticide is not a substitute for contraception
  I did not mean to say that they would be exactly equivalent nor that infanticide would be without significant downsides.
  How is this not an excellent example of how under novel circumstances, inner-optimizers (like human brains) can almost all (serial sperm donor cases like hundreds out of billions) diverge extremely far (if forfeiting >10,000% is not diverging far, what would be?) from the optimization process’s reward function (within-generation increase in allele frequencies), while pursuing other rewards (whatever it is you are enjoying doing while very busy not ever donating sperm)?
  “Inner optimizers diverging from the optimization process’s reward function” sounds to me like humans were already donating to sperm banks in the EEA, only for an inner optimizer to wreak havoc and sidetrack us from that. I assume you mean something different, since under that interpretation of what you mean the answer would be obvious—that we don’t need to invoke inner optimizers because there were no sperm banks in the EEA, so “that’s not the kind of behavior that evolution selected for” is a sufficient explanation.
- jacob_cannell 5 Nov 2023 2:20 UTC
  10 points
  3
  Parent
  The “why aren’t men all donating to sperm banks” argument assumes that 1.) evolution is optimizing for some simple reducible individual level IGF objective, and 2.) that anything less than max individual score on that objective over most individuals is failure.
  
  No AI we create will be perfectly aligned, so instead all that actually matters is the net utility that AI provides for its creators: something like the dot product between our desired future trajectory and that of the agents. More powerful agents/optimizers will move the world farther faster (longer trajectory vector) which will magnify the net effect of any fixed misalignment (cos angle between the vectors), sure. But that misalignment angle is only relevant/measurable relative to the net effect—and by that measure human brain evolution was an enormous unprecedented success according to evolutionary fitness.
  
  Evolution is a population optimization algorithm that explores a solution landscape via huge N number of samples in parallel, where individuals are the samples. Successful species with rapidly growing populations will naturally experience growth in variance/variation (ala adaptive radiation) as the population grows. Evolution only proceeds by running many many experiments, most of which must be failures in a struct score sense—that’s just how it works.
  
  Using even the median sample’s fitness would be like faulting SGD for every possible sample of the weights at any point during a training process. For SGD all that matters is the final sample, and likewise all that ‘matters’ for evolution is the tiny subset of most future fit individuals (which dominate the future distribution). To the extent we are/will use evolutionary algorithms for AGI design, we also select only the best samples to scale up, so only the alignment of the best samples is relevant for similar reasons.
  
  So if we are using individual human samples as our point of analogy comparison, the humans that matter for comparing the relative success of evolution at brain alignment are the most successful: modern sperm donors, genghis khan, etc. Evolution has maintained a sufficiently large sub population of humans who do explicitly optimize for IGF even in the modern environment (to the extent that makes sense translated into their ontology), so its doing very well in that regard (and indeed it always needs to maintain a large diverse high variance population distribution to enable quick adaptation to environmental changes).
  
  We aren’t even remotely close to stressing brain alignment to IGF. Most importantly we don’t observe species going extinct because they evolved general intelligence, experienced a sharp left turn, and then died out due to declining populations. But the sharp left turn argument does predict that, so its mostly wrong.
  - dxu 5 Nov 2023 3:39 UTC
    4 points
    1
    Parent
    
    No AI we create will be perfectly aligned, so instead all that actually matters is the net utility that AI provides for its creators: something like the dot product between our desired future trajectory and that of the agents. More powerful agents/optimizers will move the world farther faster (longer trajectory vector) which will magnify the net effect of any fixed misalignment (cos angle between the vectors), sure. But that misalignment angle is only relevant/measurable relative to the net effect—and by that measure human brain evolution was an enormous unprecedented success according to evolutionary fitness.
    
    The vector dot product model seems importantly false, for basically the reason sketched out in this comment; optimizing a misaligned proxy isn’t about taking a small delta and magnifying it, but about transitioning to an entirely different policy regime (vector space) where the dot product between our proxy and our true alignment target is much, much larger (effectively no different from that of any other randomly selected pair of vectors in the new space).
    
    (You could argue humans haven’t fully made that phase transition yet, and I would have some sympathy for that argument. But I see that as much more contingent than necessarily true, and mainly a consequence of the fact that, for all of our technological advances, we haven’t actually given rise to that many new options preferable to us but not to IGF. On the other hand, something like uploading I would expect to completely shatter any relation our behavior has to IGF maximization.)
    - jacob_cannell 5 Nov 2023 18:27 UTC
      8 points
      −1
      Parent
      
      The vector dot product model seems importantly false, for basically the reason sketched out in this comment;
      
      Notice I replied to that comment you linked and agreed with John, but not that any generalized vector dot product model is wrong, but that the specific one in that post is wrong as it doesn’t weight by expected probability ( ie an incorrect distance function).
      
      Anyway I used that only as a convenient example to illustrate a model which separates degree of misalignment from net impact, my general point does not depend on the details of the model and would still stand for any arbitrarily complex non-linear model.
      
      The general point being that degree of misalignment is only relevant to the extent it translates into a difference in net utility.
      
      You could argue humans haven’t fully made that phase transition yet, and I would have some sympathy for that argument.
      
      From the perspective of evolutionary fitness, humanity is the penultimate runaway success—AFAIK we are possibly the species with the fastest growth in fitness ever in the history of life. This completely overrides any and all arguments about possible misalignment, because any such misalignment is essentially epsilon in comparison to the fitness gain brains provided.
      
      For AGI, there is a singular correct notion of misalignment which actually matters: how does the creation of AGI—as an action—translate into differential utility, according to the utility function of its creators? If AGI is aligned to humanity about the same as brains are aligned to evolution, then AGI will result in an unimaginable increase in differential utility which vastly exceeds any slight misalignment.
      
      You can speculate all you want about the future and how brains may be become misaligned in the future, but that is just speculation.
      
      If you actually believe the sharp left turn argument holds water, where is the evidence?
      
      As as I said earlier this evidence must take a specific form, as evidence in the historical record:
      
      We aren’t even remotely close to stressing brain alignment to IGF. Most importantly we don’t observe species going extinct because they evolved general intelligence, experienced a sharp left turn, and then died out due to declining populations. But the sharp left turn argument does predict that, so its mostly wrong.
      - dxu 14 Nov 2023 5:25 UTC
        2 points
        0
        Parent
        
        Notice I replied to that comment you linked and agreed with John, but not that any generalized vector dot product model is wrong, but that the specific one in that post is wrong as it doesn’t weight by expected probability ( ie an incorrect distance function).
        
        Anyway I used that only as a convenient example to illustrate a model which separates degree of misalignment from net impact, my general point does not depend on the details of the model and would still stand for any arbitrarily complex non-linear model.
        
        The general point being that degree of misalignment is only relevant to the extent it translates into a difference in net utility.
        
        Sure, but if you need a complicated distance metric to describe your space, that makes it correspondingly harder to actually describe utility functions corresponding to vectors within that space which are “close” under that metric.
        
        If you actually believe the sharp left turn argument holds water, where is the evidence?
        
        As as I said earlier this evidence must take a specific form, as evidence in the historical record
        
        Hold on; why? Even for simple cases of goal misspecification, the misspecification may not become obvious without a sufficiently OOD environment; does that thereby mean that no misspecification has occurred?
        
        And in the human case, why does it not suffice to look at the internal motivations humans have, and describe plausible changes to the environment for which those motivations would then fail to correspond even approximately to IGF, as I did w.r.t. uploading?
        
        But I see that as much more contingent than necessarily true, and mainly a consequence of the fact that, for all of our technological advances, we haven’t actually given rise to that many new options preferable to us but not to IGF. On the other hand, something like uploading I would expect to completely shatter any relation our behavior has to IGF maximization.
        
        It seems to me that this suffices to establish that the primary barrier against such a breakdown in correspondence is that of insufficient capabilities—which is somewhat the point!
        jacob_cannell 14 Nov 2023 16:58 UTC
        2 points
        0
        Parent
        
        If you actually believe the sharp left turn argument holds water, where is the evidence? As as I said earlier this evidence must take a specific form, as evidence in the historical record
        
        Hold on; why? Even for simple cases of goal misspecification, the misspecification may not become obvious without a sufficiently OOD environment;
        
        Given any practical and reasonably aligned agent, there is always some set of conceivable OOD environments where that agent fails. Who cares? There is a single success criteria: utility in the real world! The success criteria is not “is this design perfectly aligned according to my adversarial pedantic critique”.
        
        The sharp left turn argument uses the analogy of brain evolution misaligned to IGF to suggest/argue for doom from misaligned AGI. But brains enormously increased human fitness rather than the predicted decrease, so the argument fails.
        
        In worlds where 1. alignment is very difficult, and 2. misalignment leads to doom (low utility) this would naturally translate into a great filter around intelligence—which we do not observe in the historical record. Evolution succeeded at brain alignment on the first try.
        
        And in the human case, why does it not suffice to look at the internal motivations humans have, and describe plausible changes to the environment for which those motivations would then fail
        
        I think this entire line of thinking is wrong—you have little idea what environmental changes are plausible and next to no idea of how brains would adapt.
        
        On the other hand, something like uploading I would expect to completely shatter any relation our behavior has to IGF maximization.
        
        When you move the discussion to speculative future technology to support the argument from a historical analogy—you have conceded that the historical analogy does not support your intended conclusion (and indeed it can not, because homo sapiens is an enormous alignment success).
        dxu 16 Nov 2023 3:12 UTC
        2 points
        0
        Parent
        It sounds like you’re arguing that uploading is impossible, and (more generally) have defined the idea of “sufficiently OOD environments” out of existence. That doesn’t seem like valid thinking to me.
        jacob_cannell 16 Nov 2023 3:48 UTC
        2 points
        0
        Parent
        Of course i’m not arguing that uploading is impossible, and obviously there are always hypothetical “sufficiently OOD environments”. But from the historical record so far we can only conclude that evolution’s alignments of brains was robust enough compared to the environment distribution shift encountered—so far. Naturally that could all change in the future, given enough time, but piling in such future predictions is clearly out of scope for an argument from historical analogy.
        
        These are just extremely different:
        
        an argument from historical observations
        an argument from future predicted observations
        
        It’s like I’m arguing that given that we observed the sequence 0,1,3,7 the pattern is probably 2^N-1, and you arguing that it isn’t because you predict the next digit is 31.
        
        Regardless uploads are arguably sufficiently categorically different that its questionable how they even relate to evolutionary success of homo sapien brain alignment to genetic fitness (do sims of humans count for genetic fitness? but only if DNA is modeled in some fashion? to what level of approximation? etc.)
        the gears to ascension 16 Nov 2023 5:03 UTC
        1 point
        0
        Parent
        Uploading is impossible because the cat ate the Internet cable again
        Nathaniel Monson 16 Nov 2023 7:13 UTC
        1 point
        0
        Parent
        Would you say it’s … _cat_egorically impossible?
- TurnTrout 6 Nov 2023 16:54 UTC
  LW: 6 AF: 4
  5
  AF Parent
  How is this not an excellent example of how under novel circumstances, inner-optimizers (like human brains) can almost all (serial sperm donor cases like hundreds out of billions) diverge extremely far (if forfeiting >10,000% is not diverging far, what would be?) from the optimization process’s reward function (within-generation increase in allele frequencies), while pursuing other rewards (whatever it is you are enjoying doing while very busy not ever donating sperm)?
  I think it’s inappropriate to use technical terms like “reward function” in the context of evolution, because evolution’s selection criteria serve vastly different mechanistic functions from eg a reward function in PPO.^[1] Calling them both a “reward function” makes it harder to think precisely about the similarities and differences between AI RL and evolution, while invalidly making the two processes seem more similar. That is something which must be argued for, and not implied through terminology.
  1. ^
    And yes, I wish that “reward function” weren’t also used for “the quantity which an exhaustive search RL agent argmaxes.” That’s bad too.
  - mesaoptimizer 6 Nov 2023 21:27 UTC
    4 points
    1
    Parent
    Yeah.
    
    The fact that we don’t have standard mechanistic models of optimization via selection (which is what evolution and moral mazes and inadequate equilibria and multipolar traps essentially are) is likely a fundamental source of confusion when trying to get people on the same page about the dangers of optimization and how relevant evolution is, as an analogy.
- Metacelsus 4 Nov 2023 22:25 UTC
  5 points
  2
  Parent
  >You can check this out very easily in seconds and verify that you could do the same thing with less effort than you’ve probably put into some video games.
  Indeed. Donating sperm over the Internet costs approximately $125 per donation (most of which is Fedex overnight shipping costs, and often the recipient will cover these) and has about a 10% pregnancy success rate per cycle.
  See: https://www.irvinesci.com/refrigeration-medium-tyb-with-gentamicin.html
  and https://www.justababy.com/
- AI-doom 5 Nov 2023 11:59 UTC
  3 points
  0
  Parent
  I agree that humans are not aligned with inclusive genetic fitness, but i think you could look at evolution as a bunch of different optimizers at any small stretch in time and not just a singel optimizer. If not getting killed by spiders is necessary for IGF for example, then evolution could be though off as both an optimizer for IGF and not getting killed by spiders. Some of these optimizers have created mesaoptimizers that resemble the original optimizer to a strong degree. Most people really care about their own biological children not dying for example. I think that thinking about evolution as multiple optimizers, makes it seem more likely that gradient descent is able to instill correct human values sometimes rather than never.
- teageegeepea 6 Nov 2023 1:41 UTC
  −1 points
  −1
  Parent
  Pregnancy is certainly costly (and the abnormally high miscarriage rate appears to be an attempt to save on such costs in case anything has gone wrong), but it’s not that fatal (for the mother). A German midwife recorded one maternal death out of 350 births.