From my (dxu’s) perspective, it’s allowable for there to be “deep fundamental theories” such that, once you understand those theories well enough, you lose the ability to imagine coherent counterfactual worlds where the theories in question are false.
To use thermodynamics as an example: the first law of thermodynamics (conservation of energy) is actually a consequence of Noether’s theorem, which ties conserved quantities in physics to symmetries in physical laws. Before someone becomes aware of this, it’s perhaps possible for them to imagine a universe exactly like our own, except that energy is not conserved; once they understand the connection implied by Noether’s theorem, this becomes an incoherent notion: you cannot remove the conservation-of-energy property without changing deep aspects of the laws of physics.
The second law of thermodynamics is similarly deep: it’s actually a consequence of there being a (low-entropy) boundary condition at the beginning of the universe, but no corresponding (low-entropy) boundary condition at any future state. This asymmetry in boundary conditions is what causes entropy to appear directionally increasing—and again, once someone becomes aware of this, it is no longer possible for them to imagine living in a universe which started out in a very low-entropy state, but where the second law of thermodynamics does not hold.
In other words, thermodynamics as a “deep fundamental theory” is not merely [what you characterized as] a “powerful abstraction that is useful in a lot of domains”. Thermodynamics is a logically necessary consequence of existing, more primitive notions—and the fact that (historically) we arrived at our understanding of thermodynamics via a substantially longer route (involving heat engines and the like), without noticing this deep connection until much later on, does not change the fact that grasping said deep connection allows one to see “at a glance” why the laws of thermodynamics inevitably follow.
Of course, this doesn’t imply infinite certainty, but it does imply a level of certainty substantially higher than what would be assigned merely to a “powerful abstraction that is useful in a lot of domains”. So the relevant question would seem to be: given my above described epistemic state, how might one convince me that the case for thermodynamics is not as airtight as I currently think it is? I think there are essentially two angles of attack: (1) convince me that the arguments for thermodynamics being a logically necessary consequence of the laws of physics are somehow flawed, or (2) convince me that the laws of physics don’t have the properties I think they do.
Both of these are hard to do, however—and for good reason! And absent arguments along those lines, I don’t think I am (or should be) particularly moved by [what you characterized as] philosophy-of-science-style objections about “advance predictions”, “systematic biases”, and the like. I think there are certain theories for which the object-level case is strong enough that it more or less screens off meta-level objections; and I think this is right, and good.
Which is to say:
The mental move I’m doing for each of these examples is not imagining universes where addition/evolution/other deep theory is wrong, but imagining phenomena/problems where addition/evolution/other deep theory is not adapted. If you’re describing something that doesn’t commute, addition might be a deep theory, but it’s not useful for what you want. Similarly, you could argue that given how we’re building AIs and trying to build AGI, evolution is not the deep theory that you want to use. (emphasis mine)
I think you could argue this, yes—but the crucial point is that you have to actually argue it. You have to (1) highlight some aspect of the evolutionary paradigm, (2) point out [what appears to you to be] an important disanalogy between that aspect and [what you expect cognition to look like in] AGI, and then (3) argue that that disanalogy directly undercuts the reliability of the conclusions you would like to contest. In other words, you have to do things the “hard way”—no shortcuts.
...and the sense I got from Richard’s questions in the post (as well as the arguments you made in this subthread) is one that very much smells like a shortcut is being attempted. This is why I wrote, in my other comment, that
I don’t think I have a good sense of the implied objections contained within Richard’s model. That is to say: I don’t have a good handle on the way(s) in which Richard expects expected utility theory to fail, even conditioning on Eliezer being wrong about the theory being useful. I think this important because—absent a strong model of expected utility theory’s likely failure modes—I don’t think questions of the form “but why hasn’t your theory made a lot of successful advance predictions yet?” move me very much on the object level.
I think I share Eliezer’s sense of not really knowing what Richard means by “deep fundamental theory” or “wide range of applications we hadn’t previous thought of”, and I think what would clarify this for me would have been for Richard to provide examples of “deep fundamental theories [with] a wide range of applications we hadn’t previously thought of”, accompanied by an explanation of why, if those applications hadn’t been present, that would have indicated something wrong with the theory.
I’m… confused by this framing? Specifically, this bit (as well as other bits like these)
seem to be coming at the problem with [something like] a baked-in assumption that prosaic alignment is something that Actually Has A Chance Of Working?
And, like, to be clear, obviously if you’re working on prosaic alignment that’s going to be something you believe[1]. But it seems clear to me that EY/MIRI does not share this viewpoint, and all the disagreements you have regarding their treatment of other avenues of research seem to me to be logically downstream of this disagreement?
I mean, it’s possible I’m misinterpreting you here. But you’re saying things that (from my perspective) only make sense with the background assumption that “there’s more than one game in town”—things like “I wish EY/MIRI would spend more time engaging with other frames” and “I don’t like how they treat lack of progress in their frame as evidence that all other frames are similarly doomed”—and I feel like all of those arguments simply fail in the world where prosaic alignment is Actually Just Doomed, all the other frames Actually Just Go Nowhere, and conceptual alignment work of the MIRI variety is (more or less) The Only Game In Town.
To be clear: I’m pretty sure you don’t believe we live in that world. But I don’t think you can just export arguments from the world you think we live in to the world EY/MIRI thinks we live in; there needs to be a bridging step first, where you argue about which world we actually live in. I don’t think it makes sense to try and highlight the drawbacks of someone’s approach when they don’t share the same background premises as you, and the background premises they do hold imply a substantially different set of priorities and concerns.
Another thing it occurs to me your frustration could be about is the fact that you can’t actually argue this with EY/MIRI directly, because they don’t frequently make themselves available to discuss things. And if something like that’s the case, then I guess what I want to say is… I sympathize with you abstractly, but I think your efforts are misdirected? It’s okay for you and other alignment researchers to have different background premises from MIRI or even each other, and for you and those other researchers to be working on largely separate agendas as a result? I want to say that’s kind of what foundational research work looks like, in a field where (to a first approximation) nobody has any idea what the fuck they’re doing?
And yes, in the end [assuming somebody succeeds] that will likely mean that a bunch of people’s research directions were ultimately irrelevant. Most people, even. That’s… kind of unavoidable? And also not really the point, because you can’t know which line of research will be successful in advance, so all you have to go on is your best guess, which… may or may not be the same as somebody else’s best guess?
I dunno. I’m trying not to come across as too aggressive here, which is why I’m hedging so many of my claims. To some extent I feel uncomfortable trying to “police” people’s thoughts here, since I’m not actually an alignment researcher… but also it felt to me like your comment was trying to police people’s thoughts, and I don’t actually approve of that either, so...
Yeah. Take this how you will.
[1] I personally am (relatively) agnostic on this question, but as a non-expert in the field my opinion should matter relatively little; I mention this merely as a disclaimer that I am not necessarily on board with EY/MIRI about the doomed-ness of prosaic alignment.