Why does this pose an issue for reinforcement learning? Forgive my ignorance, I do not have a background in the subject. Though I don’t believe that I have information which distinguishes cereal/granola in terms of which has stronger highest-severity consequences (given the smallness of those numbers and my inability to conceive of them, I strongly suspect anything I could come up with would exclusively represent epistemic and not aleatoric uncertainty), even if I accept it then the theory would tell me, correctly, that I should act based on that level. If that seems wrong, then it’s evidence we’ve incorrectly identified an implicit severity class in our imagination of the hypothetical, not that severity classes are incoherent (i.e. if I really have reason to believe that eating cereal even slightly increases the chance of Universe Destruction compared to eating granola, shouldn’t that make my decision for me?)
I would argue that many actions are sufficiently isolated such that, while they’ll certainly have high-severity ripple effects, we have no reason to believe that on expectation the high-severity consequences are worse than they would have been for a different action.
If the non-Archimedean framework really does “collapse” to an Archimedean one in practice, that’s fine with me. It exists to respond to a theoretical question about qualitatively different forms of utility, without biting a terribly strange bullet. Collapsing the utility function would mean assigning weight 0 to all but the maximal severity level, which seems very bad in that we certainly prefer no dust specks in our eyes to dust specks (ceteris paribus), and this should be accurately reflected in our evaluation of world states, even if the ramped function does lead to the same action preferences in many/most real-life scenarios for a sufficiently discerning agent (which maybe AI will be, but I know I am not).
If we had infinite compute, that would not eliminate empirical uncertainty. There are many things you cannot compute because you just don’t have enough information. This is why in learning theory sample complexity is distinct from computational complexity, and applies to algorithms with unlimited computational resources. So, you would definitely still need to take expectations.
Thanks for letting me know about this! Another thing I haven’t studied.
I agree that delineating the precise boundaries of comparability classes is a uniquely challenging task. Nonetheless, it does not mean they don’t exist—to me your claim feels along the same lines as classical induction “paradoxes” involving classifying sand heaps. While it’s difficult to define exactly what a sand heap is, we can look at many objects and say with certainty whether or not they are sand heaps, and that’s what matters for living in the world and making empirical claims (or building sandcastles anyway).
I suspect it’s quite likely that experiences you may be referring to as “higher quantities of themselves” within a single person are in fact qualitatively different and no longer comparable utilities in many cases. Consider the dust specks: they are assumed to be minimally annoying and almost indetectable to the bespeckèd. However, if we even slightly upgrade them so as to cause a noticeable sting in their targeted eye, they appear to reach a whole different level. I’d rather spend my life plagued by barely noticeable specks (assuming they have no interactions) than have one slightly burn my eyeball.