Creating some sort of commitment device that would bind us to follow UDT—before we evaluate some set of hypotheses—is an example for one potentially consequential intervention.
As an aside, my understanding is that in environments that involve multiple UDT agents, UDT doesn’t necessarily work well (or is not even well-defined?).
Also, if we would use SGD to train a model that ends up being an aligned AGI, maybe we should figure out how to make sure that that model “follows” a good decision theory. (Or does this happen by default? Does it depend on whether “following a good decision theory” is helpful for minimizing expected loss on the training set?)
Creating some sort of commitment device that would bind us to follow UDT—before we evaluate some set of hypotheses—is an example for one potentially consequential intervention.
As an aside, my understanding is that in environments that involve multiple UDT agents, UDT doesn’t necessarily work well (or is not even well-defined?).
Also, if we would use SGD to train a model that ends up being an aligned AGI, maybe we should figure out how to make sure that that model “follows” a good decision theory. (Or does this happen by default? Does it depend on whether “following a good decision theory” is helpful for minimizing expected loss on the training set?)