A philosophically reflective AGI might adopt a view of reality like UDASSA, and value paperclips existing in the base world more because of its smaller description length. Plus it will be able to make many more paperclips if it’s in the real world, since simulated Clippy will presumably be shut down after it begins its galactic expansion phase.
interstice
Decision-theoretically, it seems that Clippy should act as if it’s in the base reality, even if it’s likely to be in a simulation, since it has much more influence over worlds where it’s in base reality. The trade could still end up going through, however, if Clippy’s utility function is concave—that is, if it would prefer a large chance of there being at least some paperclips in every universe to a small chance of there being many paperclips. Then Humanity can agree to make a few paperclips in universes where we win in exchange for Clippy not killing us in universes where it wins. This suggests concave utility functions might be a good desiderata for potential AGIs.
Doesn’t that fall afoul of the mediocrity principle applied to generic intelligence overall?
Sure. I just think we have enough evidence to overrule the principle, in the form of sensory experiences apparently belonging to a member of a newly-arisen intelligent species. Overruling mediocrity principles with evidence is common.
The argument is not that generic computations are likely simulated, it’s about our specific situation—being a newly intelligent species arising in an empty universe. So simulationists would take the ‘rare’ branch of your trilemma.
Are you saying that we can’t be in a simulation because our descendants might go on to build a large number of simulations themselves, requiring too many resources in the base reality? But I don’t think that weakens the argument very much, because we aren’t currently in a position to run a large number of simulations. Whoever is simulating us can just turn off/reset the simulation before that happens.
pareto optimal relationship between speed, simplicity and generality
This is an interesting subject. I think that the average slope of the speed-simplicity frontier might give a good measure of the complexity of an object, specifically of the number of layers of emergent behavior that the object exhibits.
The incompressibility of 0 or a quark isn’t a problem to physical reductionism
I actually do think some people register these being incompressible as a problem. Think of “what breathes fire into the equations” or “why does anything exist at all”(OK, more about the incompressibility of the entire world than a quark, but same idea—I could imagine people being confused about “what even are quarks in themselves” or something...)
So, dualism is true? For the dualist, there is no expectation that qualia should be compressible or reducible. But that’s not a meta-explanation
We can distinguish two levels of analysis.
-
firstly, accepting a naïve physicalism, we can try to give an account of why people would report being confused about consciousness, which could in principle be cashed out in purely physical predictions about what words they speak or symbols they type into computers. That’s what I was attempting to do in the first two sections of the article(without spelling out in detail how the algorithms ultimately lead to typing etc., given that I don’t think the details are especially important for the overall point) I think people with a variety of metaphysical views could come to agree on an explanation of this “meta-problem” without coming to agree on the object level.
-
Secondly, there is the question of how to relate that analysis to our first-person perspective. This was what I was trying to do in the last section of the article(which I also feel much less confident in than the first two sections). You could say it’s dualist in a sense, although I don’t think there “is” a non-physical mental substance or anything like that. I would rather say that reality, from the perspective of beings like us, is necessarily a bit indexical—that is, one always approaches reality from a particular perspective. You can enlarge your perspective, but not so much that you attain an observer-independent overview of all of reality. Qualia are a manifestation of this irreducible indexicality.
-
Responded!
Hmm. I’m not sure if we disagree? I agree that the incompressibility of qualia is relative to a given perspective(or to be a bit pedantic, I would say that qualia themselves are only defined relative to a perspective) By incompressible, I mean incompressible to the brain’s coding mechanism. This is enough to explain the meta-hard problem, as it is our brain’s particular coding mechanism that causes us to report that qualia seem inexplicable physically.
So the incompressibility of qualia is “merely” subjective, but subjectivity isn’t mere
I also think I might agree here? The way I have come to think about it is like this: the point of world-models is to explain our experiences. But in the course of actually building and refining those models, we forget this and think instead that the point is instead to reduce everything to the model, even our experiences(because this is often a good heuristic in the usual course of improving our world-model). This is analogous to the string-compressing robot who thinks that its string-compressing program is more “real” than the string it is attempting to compress. I think the solution is to simply accept that experiences and physics occupy different slots in our ontology and we shouldn’t expect to reduce either to the other.
I think I have a pretty good theory of conscious experience, focused on the meta-problem—explaining why it is that we think consciousness is mysterious. Basically I think the sense of mysteriousness results from our brain considering ‘redness’(/etc) to be a primitive data type, which cannot be defined in terms of any other data. I’m not totally sure yet how to extend the theory to cover valence, but I think a promising way forward might be trying to reverse-engineer how our brain detects empathy/other-mind-detection at an algorithmic level, then extend that to cover a wider class of systems.
Hmmm....interesting. So in this picture, human values are less like a single function defined on an internal world model, and more like a ‘grand bargain’ among many distinct self-preserving mesa-optimizers. I’ve had vaguely similar thoughts in the past, although the devil is in the details with such proposals(e.g: just how agenty are you imagining these circuits to be? do they actually have the ability to do means-end reasoning about the real world, or have they just stumbled upon heuristics that seem to work well? What kind of learning is applied to them, supervised, unsupervised, reinforcement?) It might be worth trying to make a very simple toy model laying out all the components. I await your future posts with interest.
The brain seems to have components that are like big neural nets—giant opaque blobs of compute optimized for some reward function. It also seems to have both long and short-term memory systems which mostly just store information for the neural-net-like systems to manipulate, similar to RAM and hard-drive. If near-term AGI is like this, there will be two types of mesa-optimizer that can arise—optimizers arising somewhere inside the big neural net, or optimizers that arise from an algorithm carried out using the memory systems. The prefrontal cortex may be an example of the former in humans. The implementation of explicit rules to improve decision making, such as EU maximization or Bayesianism, is an example of the latter(h/t to the ELK report)
You could have another limited AI design a nanofactory to make ultra-fast computers to run the emulations. I think a more difficult problem is getting a limited AI to do neuroscience well. Actually I think this whole scenario is kind of silly, but given the implausible premise of a single AI lab having a massive tech lead over all others, neuroscience may be the bigger barrier.
Yeah. I think this sort of thing is why Eliezer thinks we’re doomed
Hmm, interesting...but wasn’t he more optimistic a few years ago, when his plan was still “pull off a pivotal act with a limited AI”? I thought the thing that made him update towards doom was the apparent difficulty of safely making even a limited AI, plus shorter timelines.
other gestured-example I’ve heard is “upload aligned people who think hard for 1000 subjective years and hopefully figure something out.”
Ah, that actually seems like it might work. I guess the problem is that an AI that can competently do neuroscience well enough to do this would have to be pretty general. Maybe a more realistic plan along the same lines might be to try using ML to replicate the functional activity of various parts of the human brain and create ‘pseudo-uploads’. Or just try to create an AI with similar architecture and roughly-similar reward function to us, hoping that human values are more generic than they might appear.
Oh, melting the GPUs would not actually be a pivotal act
Well yeah, that’s my point. It seems to me that any pivotal act worthy of the name would essentially require the AI team to become an AGI-powered world government, which seems pretty darn difficult to pull off safely. The superpowered-AI-propaganda plan falls under this category. The long-lasting nanomachines idea is cute, but I bet people would just figure out ways to evade the nanomachines’ definition of ‘GPU’.
Note that these aren’t intended to be very good/realistic suggestions, they’re just meant to point to different dimensions of the possibility space
Fair enough...but if the pivotal act plan is workable, there should be some member of that space which actually is good/seems like it has a shot of working out in reality(and which wouldn’t require a full FAI). I’ve never heard any and am having a hard time thinking of one. Now it could be that MIRI or others think they have a workable plan which they don’t want to share the details of due to infohazard concerns. But as an outside observer, I have to assign a certain amount of probability to that being self-delusion.
I think it’s plausible that an unaligned singularity could lead to things we consider interesting, because human values might be more generic than they appear, the apparent complexity an emergent feature resulting from power-seeking and curiosity drives or mesa-optimization. I also think the whole framework of “a singleton with a fixed utility function becomes all-powerful and optimizes that for all eternity” might be wrong, since human values don’t seem to work that way.
it’s much more likely that someone could actually perform a unilateral pivotal act; it is a far easier problem, even after accounting for the problems the OP mentions in Part 1.
What I’ve never understood about the pivotal act plan is exactly what the successful AGI team is supposed to do after melting the GPUs or whatever. Every government on Earth will now consider them their enemy; they will immediately be destroyed unless they can defend themselves militarily, then countries will simply rebuild the GPU factories and continue on as before(except now in a more combative, disrupted, AI-race-encouraging geopolitical situation). So any pivotal act seems to require, at a minimum, an AI capable of militarily defeating all countries’ militaries. Then in order to not have society collapse, you probably need to become the government yourself, or take over or persuade existing governments to go along with your agenda. But an AGI that would be capable of doing all this safely seems...not much easier to create than a full-on FAI? It’s not like you could get by with an AI that was freakishly skilled at designing nanomachines but nothing else, you’d need something much more general. But isn’t the whole idea of the pivotal act plan that you don’t need to solve alignment in full generality to execute a pivotal act? For these reasons, executing a unilateral pivotal act(that actually results in an x-risk reduction) does not seem obviously easier than convincing governments to me.
Definitely agreed that we shouldn’t try to obtain a highly specific definition of human values right now. And that we’ll likely find that better formulations lead to breaking down human values in ways we currently wouldn’t expect.
Seemed like a bit of a rude way to let someone know they had a typo, I would have just gone with “Typo: money brain should be monkey brain”.
They’re not opposites, they’re two different ways of analyzing the same situation. Examining the local density matrices at various places, we may find decoherence has occurred, even while the global state is in a coherent superposition.