Newcomblike problems occur whenever knowledge about what decision you will make leaks into the environment. The knowledge doesn’t have to be 100% accurate, it just has to be correlated with your eventual actual action.
This is far too general. The way in which information is leaking into the environment is what separates Newcomb’s problem from the smoking lesion problem. For your argument to work you need to argue that whatever signals are being picked up on would change if the subject changed their disposition, not merely that these signals are correlated with the disposition.
Relatedly, with your interview example, I think that perhaps a better model is that whether a person is confident or shy is not depending on whether they believe that they will be bold or not, but upon the degree to which they care about being laughed at. If you are confident, you don’t care about being laughed at and might as well be bold. If you are afraid of being laughed at, you already know that you are shy and thus do not gain anything by being bold.
I think my bigger point is that you don’t seem to make any real argument as to which case we are in. For example, consider the following model of how people’s perception of my trustworthiness might be correlated to my actual trustworthiness:
There are two causal chains:
My values → Things I say → Peoples’ perceptions
My values → My actions
So if I value trustworthiness, I will not, for example talk much about wanting to avoid being sucker (in contexts where it would refer to be doing trustworthy things). This will influence peoples’ perceptions of whether or not I am trustworthy. Furthermore, if I do value trustworthiness, I will want to be trustworthy.
This setup makes things look very much like the smoking lesion problem. A CDT agent that values trustworthiness will be trustworthy because they place intrinsic value in it. A CDT agent that does not value trustworthiness will be perceived as being untrustworthy. Simply changing their actions will not alter this perception, and therefore they will fail to be trustworthy in situations where it benefits them, and this is the correct decision.
Now you might try to break the causal link:
My values → Things that I say
And doing so is certainly possible (I mean you can have spies that successfully pretend to be loyal for extended periods without giving themselves away). On the other hand, it might not happen often for several possible reasons:
A) Maintaining a facade at all times is exhausting (and thus imposes high costs)
B) Lying consistently is hard (as in too computationally expensive)
C) The right way to lie consistently, is to simulate the altered value set, but this may actually lead to changing your values (standard advice for become more confident is pretending to be confident, right?).
So yes, in this model an non-trust-valuing and self-modifying CDT agent will self-modify, but it will need to self-modify its values rather than its decision theory. Using a decision theory that is trustworthy despite not intrinsically valuing it doesn’t help.
This is far too general. The way in which information is leaking into the environment is what separates Newcomb’s problem from the smoking lesion problem. For your argument to work you need to argue that whatever signals are being picked up on would change if the subject changed their disposition, not merely that these signals are correlated with the disposition.
Right you are. Edited for clarity.
Relatedly, with your interview example, I think that perhaps a better model is that whether a person is confident or shy is not depending on whether they believe that they will be bold or not, but upon the degree to which they care about being laughed at. If you are confident, you don’t care about being laughed at and might as well be bold. If you are afraid of being laughed at, you already know that you are shy and thus do not gain anything by being bold.
I think my bigger point is that you don’t seem to make any real argument as to which case we are in. For example, consider the following model of how people’s perception of my trustworthiness might be correlated to my actual trustworthiness: There are two causal chains: My values → Things I say → Peoples’ perceptions My values → My actions So if I value trustworthiness, I will not, for example talk much about wanting to avoid being sucker (in contexts where it would refer to be doing trustworthy things). This will influence peoples’ perceptions of whether or not I am trustworthy. Furthermore, if I do value trustworthiness, I will want to be trustworthy.
This setup makes things look very much like the smoking lesion problem. A CDT agent that values trustworthiness will be trustworthy because they place intrinsic value in it. A CDT agent that does not value trustworthiness will be perceived as being untrustworthy. Simply changing their actions will not alter this perception, and therefore they will fail to be trustworthy in situations where it benefits them, and this is the correct decision.
Now you might try to break the causal link: My values → Things that I say And doing so is certainly possible (I mean you can have spies that successfully pretend to be loyal for extended periods without giving themselves away). On the other hand, it might not happen often for several possible reasons: A) Maintaining a facade at all times is exhausting (and thus imposes high costs) B) Lying consistently is hard (as in too computationally expensive) C) The right way to lie consistently, is to simulate the altered value set, but this may actually lead to changing your values (standard advice for become more confident is pretending to be confident, right?).
So yes, in this model an non-trust-valuing and self-modifying CDT agent will self-modify, but it will need to self-modify its values rather than its decision theory. Using a decision theory that is trustworthy despite not intrinsically valuing it doesn’t help.