Surely the important thing is that it will self-modify to whatever decision theory has the best consequences?
The new algorithm will not exactly be TDT, because it won’t try to change decisions that have already been made the way TDT does. In particular this means that there’s no risk from Roko’s basilisk.
Disclaimer: I’m not very confident of anything I say about decision theory.
Surely the important thing is that it will self-modify to whatever decision theory has the best consequences?
The new algorithm will not exactly be TDT, because it won’t try to change decisions that have already been made the way TDT does. In particular this means that there’s no risk from Roko’s basilisk.
Disclaimer: I’m not very confident of anything I say about decision theory.