CEO at Conjecture.
I don’t know how to save the world, but dammit I’m gonna try.
CEO at Conjecture.
I don’t know how to save the world, but dammit I’m gonna try.
I haven’t read Critch in-depth, so I can’t guarantee I’m pointing towards the same concept he is. Consider this a bit of an impromptu intuition dump, this might be trivial. No claims on originality of any of these thoughts and epistemic status “¯\_(ツ)_/¯”
The way I currently think about it is that multi-multi is the “full hard problem”, and single-single is a particularly “easy” (still not easy) special case.
In a way we’re making some simplifying assumptions in the single-single case. That we have one (pseudo-cartesian) “agent” that has some kind of definite (or at least bounded-ly complicated) values that can be expressed. This means we kind of have “just” the usual problems of a) expressing/extracting/understanding the values, in so far as that is possible (outer alignment) and b) making sure the agent actually fulfills those values (inner alignment).
Multi principals then relaxes this assumption into saying we don’t have a “single” function, but multiple, which introduces another “necessary ingredient”: Some kind of social choice theory “synthesis function”, that can take in all the individual functions and spit out a “super utility function” that represents some morally acceptable amalgamation of the other functions (whatever that means). The single case is a simpler special case in that the synthesis function is the equivalent of the identity function, but that no longer works if you have multiple inputs.
In a very simplistic sense, multi is “harder” because we are introducing an additional “degree of freedom”. So you might argue we have outer alignment, inner alignment and “even-more-outerer alignment” or “multi-outer alignment” (which would be the synthesis problem), and you probably have to make hard (potentially irreconcilable) moral choices for at least the latter (probably for all).
In multi-multi, if the agents serve (or have different levels of alignment towards) different subsets of principals, this would then add the additional difficulty of game theory between the different agents and how they should coordinate. We can call that the “multi-inner alignment problem” or something, the question of how to get the amalgamation of competing agents to be “inner aligned” and not blow everything up and getting stuck in defect-defect spirals or whatever. (This reminds me a lot of what CLR works on)
I tbh am not sure if single-multi would be harder/different from single-single just “applied multiple times”. Maybe if the agents have different ideas of what the principal wants they could compete, but that seems like a failure of outer alignment, but maybe it would be better cast as a kind of failure of “multi-inner alignment”.
So in summary I think solutions (in so far as such a thing even exists in an objective fashion, which it may or may not) to the multi-multi problem are a superset of solutions to multi-single, single-multi and single-single. Vaguely, outer alignment = normativity/value learning, inner alignment = principal agent problem, multi-outer alignment = social choice, multi-inner alignment = game theory, and you need to solve all four to solve multi-multi. If you make certain simplifying assumptions which correspond to introducing “singles”, you can ignore one or more of these (i.e. a single agent doesn’t need game theory, a single principal doesn’t need social choice).
Or something. Maybe the metaphor is too much of a stretch and I’m seeing spurious patterns.
I am so excited about this research, good luck! I think it’s almost impossible this won’t turn up at least some interesting partial results, even if the strong versions of the hypothesis don’t work out (my guess would be you run into some kind of incomputability or incoherence results in finding an algorithm that works for every environment).
This is one of the research directions that make me the most optimistic that alignment might really be tractable!
This is a great intuition pump, thanks! It makes me appreciate just how, in a sense, weird it is that abstractions work at all. It seems like the universe could just not be constructed this way (though one could then argue that probably intelligence couldn’t exist in such chaotic universes, which is in itself interesting). This makes me wonder if there is a set of “natural abstractions” that are a property of the universe itself, not of whatever learning algorithm is used to pick up on them. Seems highly relevant to value learning and the like.
Great writeup, thanks!
To add to whether or not kludge and heuristics are part of the theory, I’ve asked the Numenta people in a few AMAs about their work, and they’ve made clear they are working solely on the neocortex (and the thalamus), but the neocortex isn’t the only thing in the brain. It seems clear that the kludge we know from the brain is still present, just maybe not in the neocortex. Limbic or other areas could implement kludge style shortcuts which could bias what the more uniform neocortex learns or outputs. Given my current state of knowledge of neuroscience, the most likely interpretation of this kind of research is that the neocortex is a kind of large unsupervised world model that is connected to all kinds of other hardcoded, RL or other systems, which all in concert produce human behavior. It might be similar to Schmidhuber’s RNNAI idea, where a RL agent learns to use an unsupervised “blob” of compute to achieve its goals. Something like this is probably happening in the brain since, at least as far as Numenta’s theories go, there is no reinforcement learning going on in the neocortex, which seems to contradict how humans work overall.
This was an excellent post, thanks for writing it!
But, I think you unfairly dismiss the obvious solution to this madness, and I completely understand why, because it’s not at all intuitive where the problem in the setup of infinite ethics is. It’s in your choice of proof system and interpretation of mathematics! (Don’t use non-constructive proof systems!)
This is a bit of an esoteric point and I’ve been planning to write a post or even sequence about this for a while, so I won’t be able to lay out the full arguments in one comment, but let me try to convey the gist (apologies to any mathematicians reading this and spotting stupid mistakes I made):
This is where things go wrong. The actual credence of seeing a hypercomputer is zero, because a computationally bounded observer can never observe such an object in such a way that differentiates it from a finite approximation. As such, you should indeed have a zero percent probability of ever moving into a state in which you have performed such a verification, it is a logical impossibility. Think about what it would mean for you, a computationally bounded approximate bayesian, to come into a state of belief that you are in possession of a hypercomputer (and not a finite approximation of a hypercomputer, which is just a normal computer. Remember arbitrarily large numbers are still infinitely far away from infinity!). What evidence would you have to observe for this belief? You would need to observe literally infinite bits, and your credence to observing infinite bits should be zero, because you are computationally bounded! If you yourself are not a hypercomputer, you can never move into the state of believing a hypercomputer exists.
This is somewhat analogous to how Solomonoff inductors cannot model a universe containing themselves. Solomonoff inductors are “one step up in the halting hierarchy” from us and cannot model universes that have “super-infinite objects” like themselves in it. Similarly, we cannot model universes that contain “merely infinite” objects (and by transitivity, any super-infinite objects either) in it, either, our bayesian reasoning does not allow it!
I think the core of the problem is that, unfortunately, modern mathematics implicitly accepts classical logic as its basis of formalization, which is a problem because the Law of Excluded Middle is an implicit halting oracle. The LEM says that every logical statement is either true or false. This makes intuitive sense, but is wrong. If you think of logical statements as programs whose truth value we want to evaluate by executing a proof search, there are, in fact three “truth values”: True, false and uncomputable! This is a necessity because any axiom system worth its salt is Turing complete (this is basically what Gödel showed in his incompleteness theorems, he used Gödel numbers because Turing machines didn’t exist yet to formalize the same idea) and therefor has programs that don’t halt. Intuitionistic Logic (the logic we tend to formalize type theory and computer science with) doesn’t have this problem of an implicit halting oracle, and in my humble opinion should be used for the formalization of mathematics, on peril of trading infinite universes for an avocado sandwich and a big lizard if we use classical logic.
Note that us using constructivist/intuitionistic logic does not mean that “infinities aren’t a thing”, it’s a bit more subtle than that (and something I have admittedly not fully deconfused for myself yet). But basically, the kind of “infinities” that cosmologists talk about are (in my ontology) very different from the “super-infinities” that you get in the limit of hypercomputation. Intuitively, it’s important to differentiate “inductive infinities” (“you need arbitrarily many steps to complete this computation”) and “real infinities” (“the solution only pops out after infinity steps have been complete” i.e. a halting oracle).
The difference makes the most sense from the perspective of computational complexity theory. The universe is a “program” of complexity class PTIME/BQP (BQP is basically just the quantum version of PTIME), which means that you can evaluate the “next state” of the universe with at most PTIME/BQP computation. Importantly, this means that even if the universe is inflationary and “infinite”, you could evaluate the state of any part of it in (arbitrarily large) finite time. There are no “effects that emerge only at infinity”. The (evaluation of a given arbitrary state of the) universe halts. This is very different to the kinds of computations a hypercomputer is capable of (and less paradoxical). Which is why I found the following very amusing:
Quite the opposite! Or rather, one of those three things is not like the other. baby-universes are in P/BQP, wormholes are in PSPACE (assuming by wormholes you mean closed timelike curves, which is afaik the default interpretation), and hyper-computers are halting-complete which is ludicrously insanely not even remotely like the other two things. So in that regard, yes, I think consciousness being equal to cheesy-bread is more likely than finding a hypercomputer!
To be clear when I talk about “non-constructive logic is Bad™” I don’t mean that the actual literal symbolic mathematics is somehow malign (of course), it’s the interpretation we assign to it. We think we’re reasoning about infinite objects, but we’re really reasoning about computable weaker versions of the objects, and these are not the same thing. If one is maximally careful with ones interpretations, this is (theoretically) not a problem, but this is such a subtle difference of interpretation that this is very difficult to disentangle in our mere human minds. I think this is at the heart of the problems with infinite ethics, because understanding what the correct mathematical interpretations are is so damn subtle and confusing, we find ourselves in bizarre scenarios that seem contradictory and insane because we accidentally naively extrapolate interpretations to objects they don’t belong to.
I didn’t do the best of jobs formally arguing for my point, and I’m honestly still 20% confused about this all (at least), but I hope I at least gave some interesting intuitions about why the problem might be in our philosophy of mathematics, not our philosophy of ethics.
P.S. I’m sure you’ve heard of it before, but on the off chance you haven’t, I can not recommend this wonderful paper by Scott Aaronson highly enough for a crash course in many of these kinds of topics relevant to philosophers.