CDT agents will totally self modify into agents that cooperate in twin prisoners dilemma, but my understanding is that the thing it self modifies into (called “son of CDT”) behaves differently than e.g. the thing EDT agents self modify into.
They will only self-modify to cooperate with twins whose action is causally downstream of their commitment, right? So a CDT agent will not self-modify to a policy that does acausal trade with twins outside the lightcone for example.
Yeah, I am not saying there is 100% convergence in decision-theory land (there also isn’t 100% convergence in epistemology land), but this is very different from saying “I think that decision theory is probably more like values than empirical beliefs”.
Bayesian priors also don’t converge, they only converge in classes (most obviously anything you assign zero probability to is something you will never start believing).
The situation with decision-theory seems a-priori pretty similar. There is lots of convergence, but the convergence only occurs under various conditions, and probably will form various classes of possible theories (and then my guess is similar to probability theory, in practice one of the classes will be the one that we expect all actual minds to fall into, but that’s very much something up in the air and unstudied and I am not confident of it).[1]
Also, to be clear, the “values are things you choose” thing is of course also only partially true. At least any human minds, and probably any AI minds we will create, will only have some very partial representation of their values that will require a huge amount of logical reasoning and interplay with decision-theory and epistemology to meaningfully unfold into something that could constitute a preference ordering.
So in some sense I am not even sure how to talk about preferences in the absence of decision-theory and epistemology, both of which have a lot of structure and convergence and as such will create convergence dynamics in value-space as well. My values are certainly subject to reflection which depends on my epistemological and decision-theoretic principles, and the same seems true for almost all minds.
CDT agents will totally self modify into agents that cooperate in twin prisoners dilemma, but my understanding is that the thing it self modifies into (called “son of CDT”) behaves differently than e.g. the thing EDT agents self modify into.
They will only self-modify to cooperate with twins whose action is causally downstream of their commitment, right? So a CDT agent will not self-modify to a policy that does acausal trade with twins outside the lightcone for example.
Yeah, I am not saying there is 100% convergence in decision-theory land (there also isn’t 100% convergence in epistemology land), but this is very different from saying “I think that decision theory is probably more like values than empirical beliefs”.
Bayesian priors also don’t converge, they only converge in classes (most obviously anything you assign zero probability to is something you will never start believing).
The situation with decision-theory seems a-priori pretty similar. There is lots of convergence, but the convergence only occurs under various conditions, and probably will form various classes of possible theories (and then my guess is similar to probability theory, in practice one of the classes will be the one that we expect all actual minds to fall into, but that’s very much something up in the air and unstudied and I am not confident of it).[1]
Also, to be clear, the “values are things you choose” thing is of course also only partially true. At least any human minds, and probably any AI minds we will create, will only have some very partial representation of their values that will require a huge amount of logical reasoning and interplay with decision-theory and epistemology to meaningfully unfold into something that could constitute a preference ordering.
So in some sense I am not even sure how to talk about preferences in the absence of decision-theory and epistemology, both of which have a lot of structure and convergence and as such will create convergence dynamics in value-space as well. My values are certainly subject to reflection which depends on my epistemological and decision-theoretic principles, and the same seems true for almost all minds.