I think we disagree primarily on 2 (and also how doomy the default case is, but let’s set that aside).
In claiming that rationality is as real as reproductive fitness, I’m claiming that there’s a theory of evolution out there.
I think that’s a crux between you and me. I’m no longer sure if it’s a crux between you and Richard. (ETA: I shouldn’t call this a crux, I wouldn’t change my mind on whether MIRI work is on-the-margin more valuable if I changed my mind on this, but it would be a pretty significant update.)
Reproductive fitness does seem to me like the kind of abstraction you can build on, though. For example, the theory of kin selection is a significant theory built on top of it.
Yeah, I was ignoring that sort of stuff. I do think this post would be better without the evolutionary fitness example because of this confusion. I was imagining the “unreal rationality” world to be similar to what Daniel mentions below:
I think I was imagining an alternative world where useful theories of rationality could only be about as precise as theories of liberalism, or current theories about why England had an industrial revolution when it did, and no other country did instead.
But, separately, I don’t get how you’re seeing reproductive fitness and evolution as having radically different realness, such that you wanted to systematically correct. I agree they’re separate questions, but in fact I see the realness of reproductive fitness as largely a matter of the realness of evolution—without the overarching theory, reproductive fitness functions would be a kind of irrelevant abstraction and therefore less real.
Yeah, I’m going to try to give a different explanation that doesn’t involve “realness”.
When groups of humans try to build complicated stuff, they tend to do so using abstraction. The most complicated stuff is built on a tower of many abstractions, each sitting on top of lower-level abstractions. This is most evident (to me) in software development, where the abstraction hierarchy is staggeringly large, but it applies elsewhere, too: the low-level abstractions of mechanical engineering are “levers”, “gears”, “nails”, etc.
A pretty key requirement for abstractions to work is that they need to be as non-leaky as possible, so that you do not have to think about them as much. When I code in Python and I write “x + y”, I can assume that the result will be the sum of the two values, and this is basically always right. Notably, I don’t have to think about the machine code that deals with the fact that overflow might happen. When I write in C, I do have to think about overflow, but I don’t have to think about how to implement addition at the bitwise level. This becomes even more important at the group level, because communication is expensive, slow, and low-bandwidth relative to thought, and so you need non-leaky abstractions so that you don’t need to communicate all the caveats and intuitions that would accompany a leaky abstraction.
One way to operationalize this is that to be built on, an abstraction must give extremely precise (and accurate) predictions.
It’s fine if there’s some complicated input to the abstraction, as long as that input can be estimated well in practice. This is what I imagine is going on with evolution and reproductive fitness—if you can estimate reproductive fitness, then you can get very precise and accurate predictions, as with e.g. the Price equation that Daniel mentioned. (And you can estimate fitness, either by using things like the Price equation + real data, or by controlling the environment where you set up the conditions that make something reproductively fit.)
If a thing cannot provide extremely precise and accurate predictions, then I claim that humans mostly can’t build on top of it. We can use it to make intuitive arguments about things very directly related to it, but can’t generalize it to something more far-off. Some examples from these comment threads of what “inferences about directly related things” looks like:
current theories about why England had an industrial revolution when it did
[biology] has far more practical consequences (thinking of medicine)
understanding why overuse of antibiotics might weaken the effect of antibiotics [based on knowledge of evolution]
Note that in all of these examples, you can more or less explain the conclusion in terms of the thing it depends on. E.g. You can say “overuse of antibiotics might weaken the effect of antibiotics because the bacteria will evolve / be selected to be resistant to the antibiotic”.
In contrast, for abstractions like “logic gates”, “assembly language”, “levers”, etc, we have built things like rockets and search engines that certainly could not have been built without those abstractions, but nonetheless you’d be hard pressed to explain e.g. how a search engine works if you were only allowed to talk with abstractions at the level of logic gates. This is because the precision afforded by those abstractions allows us to build huge hierarchies of better abstractions.
So now I’d go back and state our crux as:
Is there a theory of rationality that is sufficiently precise to build hierarchies of abstraction?
I would guess not. It sounds like you would guess yes.
I think this is upstream of 2. When I say I somewhat agree with 2, I mean that you can probably get a theory of rationality that makes imprecise predictions, which allows you to say things about “directly relevant things”, which will probably let you say some interesting things about AI systems, just not very much. I’d expect that, to really affect ML systems, given how far away from regular ML research MIRI research is, you would need a theory that’s precise enough to build hierarchies with.
(I think I’d also expect that you need to directly use the results of the research to build an AI system, rather than using it to inform existing efforts to build AI.)
(You might wonder why I’m optimistic about conceptual ML safety work, which is also not precise enough to build hierarchies of abstraction. The basic reason is that ML safety is “directly relevant” to existing ML systems, and so you don’t need to build hierarchies of abstraction—just the first imprecise layer is plausibly enough. You can see this in the fact that there are already imprecise concepts that are directly talking about safety.)
The security mindset model of reaching high confidence is not that you have a model whose overall predictive accuracy is high enough, but rather that you have an argument for security which depends on few assumptions, each of which is individually very likely. E.G., in computer security you don’t usually need exact models of attackers, and a system which relies on those is less likely to be secure.
Your few assumptions need to talk about the system you actually build. On the model I’m outlining, it’s hard to state the assumptions for the system you actually build, and near-impossible to be very confident in those assumptions, because they are (at least) one level of hierarchy higher than the (assumed imprecise) theory of rationality.
I generally like the re-framing here, and agree with the proposed crux.
I may try to reply more at the object level later.
Abram, did you reply to that crux somewhere?