reply to a general theme of recent discussion—the idea that uploads are even theoretically a useful solution for safety:
the first brain uploads are likely to have accuracy issues that amplify unsafety already in a human.
humans are not reliably in the safety basin—not even (most?) of the ones seeking safety. in particular, many safety community members seem to have large blindspots that they defend as being important to their views on safety; it is my view that yudkowsky has given himself an anxiety disorder and that his ongoing insights are not as high quality as they seem to him. this is not to claim he is reliably wrong, merely that I wouldn’t trust him to do compressive self-distillation because I think he’d make the same mistakes he fears an initially partially aligned AI would. humans have adversarial example vulnerability too.
the first brain uploads are likely to not be faster than a human, as humans are already very thermally efficient for the computations they’re running. improved connectivity might be able to distill down to a much smaller, higher-accuracy network—but then we’re reintroducing the compressive self-distillation commonly known as “self improvement”, which is a significant fraction of the worry around the transition from soft asi to hard asi anyway.
But surely some human uploads would be a good solution for safety, right? As a lower bound, if we had high-quality uploads of the alignment team, they could just do whatever they were going to in the real world in the emulation.
coming back to this I’m realizing I didn’t answer, no, I don’t think merely uploading the alignment team would really help that much, the problem is that universalizing coprotection between arbitrary blocks of matter in a way that doesn’t have adversarial examples is really really incredibly hard and being on a digital computer doesn’t really make you faster at figuring it out. you could try to self modify but if you don’t have some solution to verifiable inter matter safety, then you need to stay worried that you might be about to diverge. and I would expect almost any approach to uploads to introduce issues that are not detectable without a lot of work. if we are being serious about uploads as a proposal in the next two years it would involve suddenly doing a lot of very advanced neuroscience to try to accurately model physical neurons. that’s actually not obviously off the table to me but it doesn’t seem like an approach worth pushing.
My argument is that faithful exact brain uploads are guaranteed to not help unless you had already solved AI safety anyhow. I do think we can simply solve ai extinction risk anyhow, but it requires us to not only prevent AI that does not follow orders, but also prevent AI from “just following orders” to do things that some humans value but which abuse others. if we fall too far into the latter attractor—which we are at immediate risk of doing, well before stably self-reflective AGI ever happens—we become guaranteed to shortly go extinct as corporations are increasingly just an ai and a human driver. eventually the strongest corporations are abusing larger and larger portions of humanity with one human at the helm. then one day ai can drive the entire economy...
it’s pretty much just the slower version of yudkowsky’s concerns. I think he’s wrong to think self-distillation will be this quick snap-down onto the manifold of high quality hypotheses, but other than that I think he’s on point. and because of that, I think the incremental behavior of the market is likely to pull us into a defection-only-game-theory hole as society’s capabilities melt in the face of increased heat and chaos at various scales of the world.
Agreed that a WBE is no more aligned or alignable than a DL system, and this is a poor way for the community to spend its weirdness points. The good news is that in practical terms it is a non-issue. There is no way WBE will happen before superintelligence. I assign it a possibility of well under 1%.
Well, I disagree strongly with metacalus. Anyway, the most likely way that “human brain emulation [will] be the first successful route to human-level digital intelligence” would be using an understanding of the brain to engineer an intelligence (such as the Numenta approach), not a complete, faithful, exact reproduction of a specific human’s brain.
metaculus community is terribly calibrated, and not by accident—it’s simply the median of community predictions. it’s normal to think you disagree with the median prediction by a lot.
agreed. realistically we’d only approach anything resembling WBE by attempting behavior cloning AI, which nicely demonstrates the issue you’d have after becoming a WBE. my point in making this comment is simply that it doesn’t even help in theory, assuming we somehow manage to not make an agent ASI and instead go straight for advanced neuron emulation. if we really, really tried, it is possible to go for WBE first, but at this point it’s pretty obvious we can reach hard ASI without it, so nobody in charge of a team like deepmind is going to go for WBE when they can just focus directly on ai capability plus a dash of safety to make the nerds happy.
reply to a general theme of recent discussion—the idea that uploads are even theoretically a useful solution for safety:
the first brain uploads are likely to have accuracy issues that amplify unsafety already in a human.
humans are not reliably in the safety basin—not even (most?) of the ones seeking safety. in particular, many safety community members seem to have large blindspots that they defend as being important to their views on safety; it is my view that yudkowsky has given himself an anxiety disorder and that his ongoing insights are not as high quality as they seem to him. this is not to claim he is reliably wrong, merely that I wouldn’t trust him to do compressive self-distillation because I think he’d make the same mistakes he fears an initially partially aligned AI would. humans have adversarial example vulnerability too.
the first brain uploads are likely to not be faster than a human, as humans are already very thermally efficient for the computations they’re running. improved connectivity might be able to distill down to a much smaller, higher-accuracy network—but then we’re reintroducing the compressive self-distillation commonly known as “self improvement”, which is a significant fraction of the worry around the transition from soft asi to hard asi anyway.
But surely some human uploads would be a good solution for safety, right? As a lower bound, if we had high-quality uploads of the alignment team, they could just do whatever they were going to in the real world in the emulation.
coming back to this I’m realizing I didn’t answer, no, I don’t think merely uploading the alignment team would really help that much, the problem is that universalizing coprotection between arbitrary blocks of matter in a way that doesn’t have adversarial examples is really really incredibly hard and being on a digital computer doesn’t really make you faster at figuring it out. you could try to self modify but if you don’t have some solution to verifiable inter matter safety, then you need to stay worried that you might be about to diverge. and I would expect almost any approach to uploads to introduce issues that are not detectable without a lot of work. if we are being serious about uploads as a proposal in the next two years it would involve suddenly doing a lot of very advanced neuroscience to try to accurately model physical neurons. that’s actually not obviously off the table to me but it doesn’t seem like an approach worth pushing.
My argument is that faithful exact brain uploads are guaranteed to not help unless you had already solved AI safety anyhow. I do think we can simply solve ai extinction risk anyhow, but it requires us to not only prevent AI that does not follow orders, but also prevent AI from “just following orders” to do things that some humans value but which abuse others. if we fall too far into the latter attractor—which we are at immediate risk of doing, well before stably self-reflective AGI ever happens—we become guaranteed to shortly go extinct as corporations are increasingly just an ai and a human driver. eventually the strongest corporations are abusing larger and larger portions of humanity with one human at the helm. then one day ai can drive the entire economy...
it’s pretty much just the slower version of yudkowsky’s concerns. I think he’s wrong to think self-distillation will be this quick snap-down onto the manifold of high quality hypotheses, but other than that I think he’s on point. and because of that, I think the incremental behavior of the market is likely to pull us into a defection-only-game-theory hole as society’s capabilities melt in the face of increased heat and chaos at various scales of the world.
I agree. And as it is presumably possible to clone EMs you could still end up with a singleton.
Agreed that a WBE is no more aligned or alignable than a DL system, and this is a poor way for the community to spend its weirdness points. The good news is that in practical terms it is a non-issue. There is no way WBE will happen before superintelligence. I assign it a possibility of well under 1%.
I think you are overconfident. Metaculus gives it 5%:
Well, I disagree strongly with metacalus. Anyway, the most likely way that “human brain emulation [will] be the first successful route to human-level digital intelligence” would be using an understanding of the brain to engineer an intelligence (such as the Numenta approach), not a complete, faithful, exact reproduction of a specific human’s brain.
Please add your prediction to Metaculus then.
metaculus community is terribly calibrated, and not by accident—it’s simply the median of community predictions. it’s normal to think you disagree with the median prediction by a lot.
agreed. realistically we’d only approach anything resembling WBE by attempting behavior cloning AI, which nicely demonstrates the issue you’d have after becoming a WBE. my point in making this comment is simply that it doesn’t even help in theory, assuming we somehow manage to not make an agent ASI and instead go straight for advanced neuron emulation. if we really, really tried, it is possible to go for WBE first, but at this point it’s pretty obvious we can reach hard ASI without it, so nobody in charge of a team like deepmind is going to go for WBE when they can just focus directly on ai capability plus a dash of safety to make the nerds happy.