Thomas Kwa comments on The Hidden Complexity of Wishes

Thomas Kwa 24 Jan 2024 1:03 UTC
18 points
I think this post is wrong because it was written before quantilizers were known. The base rate of people being rescued from burning buildings is much higher than the rate of buildings exploding and hurling people out of them, so the Outcome Pump will only explode the building if the function strongly favors that over your mother being rescued. Even 99% reset probability is not enough to explode the building, unless it was already likely to explode.
It may be that setting Pr(reset) to make most outcomes vastly unlikely, like Pr(reset) = [0 if distance(mother, building, 5 seconds from now) > 100 meters else 0.999999], causes some weird outcome like exploding the building. But allowing likely outcomes, e.g. Pr(reset) = [0 if distance(mother, building, 20 minutes from now) > 100 meters else 0.999], probably saves her, unless this was super unlikely to happen in the first place, in which case she jumps out and her body is carried away or something.
Basically, this post implies that all wishes are unsafe. But only wishes with very low prior probability are unsafe.
- Lucius Bushnaq 23 Feb 2024 22:10 UTC
  3 points
  0
  Parent
  I figured the probability adjustments the pump was making were modifying Everett branch amplitude ratios. Not probabilities as in reasoning tools to deal with incomplete knowledge of the world and logical uncertainty that tiny human brains use to predict how this situation might go based on looking at past ‘base rates’. It’s unclear to me how you could make the latter concept of an outcome pump a coherent thing at all. The former, on the other hand, seems like the natural outcome of the time machine setup described. If you turn back time when the branch doesn’t have the outcome you like, only branches with the outcome you like will remain.
  I can even make up a physically realisable model of an outcome pump that acts roughly like the one described in the story without using time travel at all. You just need a bunch of high quality sensors to take in data, an AI that judges from the observed data whether the condition set is satisfied, a tiny quantum random noise generator to respect the probability orderings desired, and a false vacuum bomb, which triggers immediately if the AI decides that the condition does not seem to be satisfied. The bomb works by causing a local decay of the metastable^[1] electroweak vacuum. This is a highly energetic, self-sustaining process once it gets going, and spreads at the speed of light. Effectively destroying the entire future light-cone, probably not even leaving the possibility for atoms and molecules to ever form again in that volume of space.^[2]
  
  So when the AI triggers the bomb or turns back time, the amplitude of earth in that branch basically disappears. Leaving the users of the device to experience only the branches in which the improbable thing they want to have happen happens.
  And causing a burning building with a gas supply in it to blow up strikes me as something you can maybe do with a lot less random quantum noise than making your mother phase through the building. Firefighter brains are maybe comparatively easy to steer with quantum noise as well, but that only works if there are any physically nearby enough to reach the building in time to save your mother at the moment the pump is activated.
  This is also why the pump has a limit on how improbable an event it can make happen. If the event has an amplitude of roughly the same size as the amplitude for the pump’s sensors reporting bad data or otherwise causing the AI to make the wrong call, the pump will start being unreliable. If the event’s amplitude is much lower than the amplitude for the pump malfunctioning, it basically can’t do the job at all.
  1. ^
    In real life, it was an open question whether our local electroweak vacuum is in a metastable state last I checked, with the latest experimental evidence I’m aware from a couple of years ago tentatively (ca. 3 sigma I think?) pointing to yes, though that calculation is probably assuming Standard model physics the applicability of which people can argue to hell and back. But it sure seems like a pretty self-consistent way for the world to be, so we can just declare that the fictional universe works like that. Substitute strangelets or any other conjectured instant-earth-annihilation-method of your choice if you like.
  2. ^
    Because the mass terms for the elementary quantum fields would look all different now. Unclear to me that the bound structures of hadronic matter we are familiar with would still be a thing.
- habryka 24 Jan 2024 1:38 UTC
  1 point
  Parent
  I don’t understand this. The post makes a reference to the Open Source Genie Project, whose description says:
  The goal of the Open-Source Wish Project is to create perfectly-worded wishes, so that when the genie comes and grants us our wish we can get precisely what we want. The genie, of course, will attempt to interpret the wish in the most malicious way possible, using any loophole to turn our wish into a living hell. The Open-Source Wish Project hopes to use the collective wisdom of all humanity to create wishes with no loopholes whatsoever.
  The post is about how to phrase wishes in the context of something that is actively interested in subverting them.
  Even beyond that, I think “prior probability of a thing happening” is one kind of outcome pump, but the post does not specify that as the kind of outcome pump it’s talking about. “Minimal matter that needs to be modified”, “Minimal energy expenditure” or “complicated alien set of preferences that will be maximized along with your wish” are also reasonable priors for outcome pumps.
  I agree that I wish the post was clearer that certain kinds of outcome pumps might be fine, but I don’t understand the basis for saying the post is false, especially given the explicit reference to the Open-Source Wish Project which directly specifies they are dealing with a malicious genie.
  - Richard_Ngo 24 Jan 2024 1:47 UTC
    16 points
    Parent
    The outcome pump is defined in a way that excludes the possibility of active subversion: it literally just keeps rerunning until the outcome is satisfied, which is a way of sampling based on (some kind of) prior probability. Yudkowsky is arguing that this is equivalent to a malicious genie. But this is a claim that can be false.
    In this specific case, I agree with Thomas that whether or not it’s actually false will depend on the details of the function: “The further she gets from the building’s center, the less the time machine’s reset probability.” But there’s probably some not-too-complicated way to define it which would render the pump safe-ish (since this was a user-defined function).
    - habryka 24 Jan 2024 2:03 UTC
      5 points
      Parent
      Ah, rereading the post I think you are right:
      The Outcome Pump is not sentient. It contains a tiny time machine, which resets time unless a specified outcome occurs. For example, if you hooked up the Outcome Pump’s sensors to a coin, and specified that the time machine should keep resetting until it sees the coin come up heads, and then you actually flipped the coin, you would see the coin come up heads. (The physicists say that any future in which a “reset” occurs is inconsistent, and therefore never happens in the first place—so you aren’t actually killing any versions of yourself.)
      Whatever proposition you can manage to input into the Outcome Pump, somehow happens, though not in a way that violates the laws of physics. If you try to input a proposition that’s too unlikely, the time machine will suffer a spontaneous mechanical failure before that outcome ever occurs.
      I find this a bit confusing to think about. In a classical universe this machine is impossible. It seems like this basically relies on quantum uncertainty. The resulting probability distribution of events will definitely not reflect your prior probability distribution, so I think Thomas’ argument still doesn’t go through. The best guess I have is that it would reflect the shape of the quantum wave-function.
      My guess is at a practical level this ends up kind of close to “particles being moved the minimum necessary distance to achieve the outcome”, which I think would generally favor outcomes like “the building explodes”. I definitely don’t think it would favor outcomes like “the fire department arrives 5 minutes earlier” since any macro-level events like that would likely require sampling from much lower amplitude parts of the wave-function (or something, this also doesn’t seem super-compatible with an Everett-interpretation of quantum mechanics, but I can kind of squint and make it work with a Copenhagen-interpretation model).
      So I do think I was wrong about Eliezer not specifying how the outcome pump works, but I think his specification still suggests that the result would definitely not be anywhere close to sampling from your prior (which I think might result in reasonable outcome), but would involve some pretty intense maximization and unintended outcomes as you start to put constraints on that prior.
      - Richard_Ngo 24 Jan 2024 2:13 UTC
        8 points
        Parent
        The resulting probability distribution of events will definitely not reflect your prior probability distribution, so I think Thomas’ argument still doesn’t go through. It will reflect the shape of the wave-function.
        This is a good point. But I don’t think “particles being moved the minimum necessary distance to achieve the outcome” actually favors explosions. I think it probably favors the sensor hardware getting corrupted, or it might actually favor messing with the firemens’ brains to make them decide to come earlier (or messing with your mother’s brain to make her jump out of the building)—because both of these are highly sensitive systems where small changes can have large effects.
        Does this undermine the parable? Kinda, I think. If you built a machine that samples from some bizarre inhuman distribution, and then you get bizarre outcomes, then the problem is not really about your wish any more, the problem is that you built a weirdly-sampling machine. (And then we can debate about the extent to which NNs are weirdly-sampling machines, I guess.)
        habryka 24 Jan 2024 2:31 UTC
        2 points
        Parent
        Does this undermine the parable? Kinda, I think. If you built a machine that samples from some bizarre inhuman distribution, and then you get bizarre outcomes, then the problem is not really about your wish any more, the problem is that you built a weirdly-sampling machine. (And then we can debate about the extent to which NNs are weirdly-sampling machines, I guess.)
        This is roughly how I would interpret the post. Physics itself is a bizarre inhuman distribution, and in-general many probability distributions from which you might want to sample from will be bizarre and inhuman.
        Agree that it’s then arguable to what degree the optimization pressure of a mature AGI arising from NNs would also be bizarre. My guess is quite bizarre, since a lot of the constraints it will face will be constraints of physics.
  - Thomas Kwa 24 Jan 2024 2:00 UTC
    3 points
    Parent
    Even beyond that, I think “prior probability of a thing happening” is one kind of outcome pump, but the post does not specify that as the kind of outcome pump it’s talking about.
    Disagree. The Outcome Pump is explicitly described as conditioning the future trajectory of the universe according to the reset function:
    The Outcome Pump is not sentient. It contains a tiny time machine, which resets time unless a specified outcome occurs. For example, if you hooked up the Outcome Pump’s sensors to a coin, and specified that the time machine should keep resetting until it sees the coin come up heads, and then you actually flipped the coin, you would see the coin come up heads. (The physicists say that any future in which a “reset” occurs is inconsistent, and therefore never happens in the first place—so you aren’t actually killing any versions of yourself.)
    Also because the Outcome Pump is not sentient, it cannot be actively interested in subverting your wish. Eliezer claims “The Outcome Pump is a genie of the second class. No wish is safe.”, implying that the subversion effect will happen even with the non-sentient, quantilizer-like Outcome Pump. It may happen that future AIs are unsafe, but this will be because they apply too much optimization.
    - habryka 24 Jan 2024 2:05 UTC
      4 points
      Parent
      Yeah, see my response to Richard. I was wrong about the Outcome Pump not being specified, but think that your use of “probability” in the top-level comment is still wrong. Clearly the outcome pump would not sample from your prior over likely events.
      It would sample from some universal prior over events (this is playing fast-and-loose with quantum mechanics, but a reasonable interpretation might be sampling from the quantum wave-function, if you take a more Copenhagen perspective). Almost any universal prior here would be very oddly shaped, so that indeed you would observe the kinds of things that Eliezer is talking about.
      - Thomas Kwa 24 Jan 2024 2:25 UTC
        6 points
        Parent
        I thought it was sampling from the quantum wavefunction, and still I think my argument works, unless this was a building that was basically deterministically going to kill your mother if you run physics from that point forward, or already had hazardous materials with a significant chance of exploding. I agree that you can’t use your own prior probabilities.
        Maybe I’m wrong about how much quantum randomness can influence events at a 5 minute timescale and the universe is actually very deterministic? If it’s very little such that you have to condition very hard to get anything to happen, then maybe the building does explode, but I’m not really sure what would happen.
        habryka 24 Jan 2024 2:29 UTC
        2 points
        Parent
        As I said, the best approximation I have is “move particles the smallest joint distance from my highest prior configuration”. Some particles are in people’s brains, but changing people’s beliefs or intentions seems like it’s very unlikely to happen via this operation, since my guess is the brain is highly redundant and works on ion channels that would require actually a quite substantial amount of matter to be displaced (comparatively). Very locally causing a chemical cain reaction somewhere seems easier, though that’s just a guess.
        I am not really sure what happens here, since I think overall physics is highly deterministic even taking into account quantumness, and my guess is for a macro-level outcome here you would need to go very quickly into astronomically low probabilities if you sample from the wave-function, and I don’t trust my reasoning for what happens in 0.00000000000000000000001% scenarios.
        My best guess is something pretty close to what Eliezer describes happens, but I couldn’t prove it to you.
        Richard_Ngo 25 Jan 2024 1:25 UTC
        2 points
        Parent
        my guess is the brain is highly redundant and works on ion channels that would require actually a quite substantial amount of matter to be displaced (comparatively)
        Neurons are very small, though, compared with the size of a hole in a gas pipe that would be necessary to cause an explosive gas leak. (Especially because you then can’t control where the gas goes after leaking, so it could take a lot of intervention to give the person a bunch of away-from-building momentum.)
        I would probably agree with you if the building happened to have a ton of TNT sitting around in the basement.
        habryka 25 Jan 2024 2:15 UTC
        2 points
        Parent
        Oh, I was definitely not thinking of a hole in a gas pipe. I was expecting something much much subtler than that (more like very highly localized temperature-increases which then chain-react). You are dealing with omniscient levels of consequence-control here.