the gears to ascension comments on the gears to ascenscion’s Shortform

the gears to ascension 24 Jun 2022 23:51 UTC
7 points
0
reply to a general theme of recent discussion—the idea that uploads are even theoretically a useful solution for safety:
- the first brain uploads are likely to have accuracy issues that amplify unsafety already in a human.
- humans are not reliably in the safety basin—not even (most?) of the ones seeking safety. in particular, many safety community members seem to have large blindspots that they defend as being important to their views on safety; it is my view that yudkowsky has given himself an anxiety disorder and that his ongoing insights are not as high quality as they seem to him. this is not to claim he is reliably wrong, merely that I wouldn’t trust him to do compressive self-distillation because I think he’d make the same mistakes he fears an initially partially aligned AI would. humans have adversarial example vulnerability too.
- the first brain uploads are likely to not be faster than a human, as humans are already very thermally efficient for the computations they’re running. improved connectivity might be able to distill down to a much smaller, higher-accuracy network—but then we’re reintroducing the compressive self-distillation commonly known as “self improvement”, which is a significant fraction of the worry around the transition from soft asi to hard asi anyway.
- interstice 25 Jun 2022 18:16 UTC
  2 points
  0
  Parent
  But surely some human uploads would be a good solution for safety, right? As a lower bound, if we had high-quality uploads of the alignment team, they could just do whatever they were going to in the real world in the emulation.
  - the gears to ascension 25 Oct 2022 0:54 UTC
    4 points
    0
    Parent
    coming back to this I’m realizing I didn’t answer, no, I don’t think merely uploading the alignment team would really help that much, the problem is that universalizing coprotection between arbitrary blocks of matter in a way that doesn’t have adversarial examples is really really incredibly hard and being on a digital computer doesn’t really make you faster at figuring it out. you could try to self modify but if you don’t have some solution to verifiable inter matter safety, then you need to stay worried that you might be about to diverge. and I would expect almost any approach to uploads to introduce issues that are not detectable without a lot of work. if we are being serious about uploads as a proposal in the next two years it would involve suddenly doing a lot of very advanced neuroscience to try to accurately model physical neurons. that’s actually not obviously off the table to me but it doesn’t seem like an approach worth pushing.
  - the gears to ascension 26 Oct 2025 4:43 UTC
    2 points
    0
    Parent
    coming back to this 3yr later: ehh, maybe. my opinions have changed at-all since then. I still think most of the difficulty remains after uploading mostly in the same way I did then; it’s not like humans are aligned with each other particularly much better than current AI is. but it does seem like a scan-and-upload, or an AI that can be reliably known to learn a model of one specific human at a time which is exactly equivalent to an upload of that human in expectation over sampling and has acceptably low model variance in some strong sense, would be an important component of a plan that can actually work. my objection at the time was specifically about literal scan-and-upload plans. It looks more plausible that distillation of a human brain once uploaded could work, though it still seems kind of weird to distill from spiking activations with slow inter-cell gene regulatory networks into relu-ish, or even state space. still doubt that you can run a scanned brain anywhere near the speed of a modern AI.
  - the gears to ascension 26 Jun 2022 6:56 UTC
    2 points
    0
    Parent
    My argument is that faithful exact brain uploads are guaranteed to not help unless you had already solved AI safety anyhow. I do think we can simply solve ai extinction risk anyhow, but it requires us to not only prevent AI that does not follow orders, but also prevent AI from “just following orders” to do things that some humans value but which abuse others. if we fall too far into the latter attractor—which we are at immediate risk of doing, well before stably self-reflective AGI ever happens—we become guaranteed to shortly go extinct as corporations are increasingly just an ai and a human driver. eventually the strongest corporations are abusing larger and larger portions of humanity with one human at the helm. then one day ai can drive the entire economy...
    
    it’s pretty much just the slower version of yudkowsky’s concerns. I think he’s wrong to think self-distillation will be this quick snap-down onto the manifold of high quality hypotheses, but other than that I think he’s on point. and because of that, I think the incremental behavior of the market is likely to pull us into a defection-only-game-theory hole as society’s capabilities melt in the face of increased heat and chaos at various scales of the world.
- Gunnar_Zarncke 25 Jun 2022 14:38 UTC
  2 points
  0
  Parent
  I agree. And as it is presumably possible to clone EMs you could still end up with a singleton.
- Lone Pine 25 Jun 2022 6:01 UTC
  2 points
  0
  Parent
  Agreed that a WBE is no more aligned or alignable than a DL system, and this is a poor way for the community to spend its weirdness points. The good news is that in practical terms it is a non-issue. There is no way WBE will happen before superintelligence. I assign it a possibility of well under 1%.
  - Gunnar_Zarncke 25 Jun 2022 14:37 UTC
    2 points
    0
    Parent
    I think you are overconfident. Metaculus gives it 5%:
    - Lone Pine 25 Jun 2022 15:54 UTC
      3 points
      0
      Parent
      Well, I disagree strongly with metacalus. Anyway, the most likely way that “human brain emulation [will] be the first successful route to human-level digital intelligence” would be using an understanding of the brain to engineer an intelligence (such as the Numenta approach), not a complete, faithful, exact reproduction of a specific human’s brain.
      - Gunnar_Zarncke 25 Jun 2022 18:36 UTC
        2 points
        0
        Parent
        Please add your prediction to Metaculus then.
    - the gears to ascension 26 Jun 2022 6:47 UTC
      1 point
      0
      Parent
      metaculus community is terribly calibrated, and not by accident—it’s simply the median of community predictions. it’s normal to think you disagree with the median prediction by a lot.
  - the gears to ascension 25 Jun 2022 8:28 UTC
    2 points
    0
    Parent
    agreed. realistically we’d only approach anything resembling WBE by attempting behavior cloning AI, which nicely demonstrates the issue you’d have after becoming a WBE. my point in making this comment is simply that it doesn’t even help in theory, assuming we somehow manage to not make an agent ASI and instead go straight for advanced neuron emulation. if we really, really tried, it is possible to go for WBE first, but at this point it’s pretty obvious we can reach hard ASI without it, so nobody in charge of a team like deepmind is going to go for WBE when they can just focus directly on ai capability plus a dash of safety to make the nerds happy.