But also, beyond all of that, the arguments around decision-theory are I think just true in the kind of boring way that physical facts about the world are true, and saying that people will have the wrong decision-theory in the future sounds to me about as mistaken as saying that lots of people will disbelieve the theory of evolution in the future. It’s clearly the kind of thing you update on as you get smarter.
Smart != philosophically competent. (There are literally millions of people in China with higher IQ than me, but AFAIK none of them invented something like FDT/UDT or is very interested in it.)
We have little idea what FDT/UDT say about what is normatively correct in this situation or in general about potential acausal interaction between a human and a superintelligence, partly because we can’t run the math, partly because we don’t even know what the math is because we can’t formalize these theories.
Another way to put it is that globally we’re a bubble (people who like FDT/UDT) within a bubble (analytic philosophy tradition) within a bubble (people who are interested in any kind of philosophy), and then even within this nested bubble there’s a further split/disagreement about what FDT/UDT actually says about this specific kind of game/interaction.
(1) seems like evidence in favor of what I am saying. In as much as we are not confident in our current DT candidates, it seems like we should expect future much smarter people to be more correct. Us getting DT wrong is evidence that getting it right is less dependent on incidental details of the people thinking about it.
(2) I mean, there are also literally millions of people in China with higher IQ than you that believe in spirits, and millions in the rest of the world that believe in the Christian god and disbelieve evolution. The correlation between correct DT and intelligence seems about as strong as it does for the theory of evolution (meaning reasonably strong in the human range, but the human range is narrow enough to not overdetermine the correct answer, especially when you don’t have any reason to think hard about it)
(3) I am quite confident that pure CDT which rules out retrocausal incentives is false. I agree that I do not know what the right way to run the math is to understand when retrocausal incentives work, and how important they are. I really don’t have much uncertainty on this, so I don’t really get your point here. I don’t need to formalize these theories to make a confident prediction that any “decision theory where rewarding people after the fact for one-time actions” cannot provide an incentive is false.
Suppose we rule out pure CDT. That still leaves “whatever the right DT is (even if it’s something like FDT/UDT), if you actually run the math on it, it says that rewarding people after the fact for one-time actions provides practically zero incentives (if people means pre-singularity humans)”. I don’t see how we can confidently rule this out.
Yep, agree this is possible (though pretty unlikely), but I was just invoking this stuff to argue against pure CDT (or equivalent decision-theories that Thomas was saying would rule out rewarding people after the fact being effective).
Or to phrase it a different way: I am very confident that future, much smarter, people will not believe in decision-theories that rule out retrocausal incentives as a class. I am reasonably confident, though not totally confident, that de-facto retrocausal incentives will bite on currently alive humans. This overall makes me think it’s like 70% likely that if we make it through the singularity well, then future civilizations will spend a decent amount of resources aligning incentives retroactively.
This isn’t super confident, but you know, somewhat more likely than not.
This seems way overconfident:
Our current DT candidates (FDT/UDT) may not be correct or on the right track.
Smart != philosophically competent. (There are literally millions of people in China with higher IQ than me, but AFAIK none of them invented something like FDT/UDT or is very interested in it.)
We have little idea what FDT/UDT say about what is normatively correct in this situation or in general about potential acausal interaction between a human and a superintelligence, partly because we can’t run the math, partly because we don’t even know what the math is because we can’t formalize these theories.
Another way to put it is that globally we’re a bubble (people who like FDT/UDT) within a bubble (analytic philosophy tradition) within a bubble (people who are interested in any kind of philosophy), and then even within this nested bubble there’s a further split/disagreement about what FDT/UDT actually says about this specific kind of game/interaction.
(1) seems like evidence in favor of what I am saying. In as much as we are not confident in our current DT candidates, it seems like we should expect future much smarter people to be more correct. Us getting DT wrong is evidence that getting it right is less dependent on incidental details of the people thinking about it.
(2) I mean, there are also literally millions of people in China with higher IQ than you that believe in spirits, and millions in the rest of the world that believe in the Christian god and disbelieve evolution. The correlation between correct DT and intelligence seems about as strong as it does for the theory of evolution (meaning reasonably strong in the human range, but the human range is narrow enough to not overdetermine the correct answer, especially when you don’t have any reason to think hard about it)
(3) I am quite confident that pure CDT which rules out retrocausal incentives is false. I agree that I do not know what the right way to run the math is to understand when retrocausal incentives work, and how important they are. I really don’t have much uncertainty on this, so I don’t really get your point here. I don’t need to formalize these theories to make a confident prediction that any “decision theory where rewarding people after the fact for one-time actions” cannot provide an incentive is false.
Suppose we rule out pure CDT. That still leaves “whatever the right DT is (even if it’s something like FDT/UDT), if you actually run the math on it, it says that rewarding people after the fact for one-time actions provides practically zero incentives (if people means pre-singularity humans)”. I don’t see how we can confidently rule this out.
Yep, agree this is possible (though pretty unlikely), but I was just invoking this stuff to argue against pure CDT (or equivalent decision-theories that Thomas was saying would rule out rewarding people after the fact being effective).
Or to phrase it a different way: I am very confident that future, much smarter, people will not believe in decision-theories that rule out retrocausal incentives as a class. I am reasonably confident, though not totally confident, that de-facto retrocausal incentives will bite on currently alive humans. This overall makes me think it’s like 70% likely that if we make it through the singularity well, then future civilizations will spend a decent amount of resources aligning incentives retroactively.
This isn’t super confident, but you know, somewhat more likely than not.