You make a very interesting point regarding: “the act of seriously considering the antisocial action is what causes you to make the microexpression that makes others not trust you”.
However, I’m more skeptical of your claim, “managing your own “poker face” is just about causally manipulating your expressions so that you can send signals that profit you”. This is something that you can try to do. You can try to intentionally think about checking to try convince them you’re not going to fold, but this isn’t the exact same thing as if you definitely weren’t going to fold. It’s true that you usually have some additional causal levers, but none of them are the exact same as be the kind of person who does X.
Regarding outperforming CDT, if CDT agents often modify themselves to become an LDT/FDT agent then it would broadly seem accurate to say that CDT is getting outcompeted.
My guess would be that agents self-modifying themselves into such agents would be the primary way MIRI folks expect we’d end up with LDT/FDT agents[1]. I tend to be more interested in understanding these kinds of decision theory problems in and of themselves, and not just as something an AI modifies itself to be able to handle, but I feel like I’m taking something of a minority position here.
agents self-modifying [...] the primary way [...] we’d end up with LDT/FDT agents [...] Okay son-of-CDT
Not really, the thing that adopts a decision theory probably didn’t have a clear position on adhering to CDT before that. Some spiritual successor to FDT could be the first clear resolution on its behavior that’s decision theory shaped.
It’s true that you usually have some additional causal levers, but none of them are the exact same as be the kind of person who does X.
Not sure I understand. It seems like “being the kind of person who does X” is a habit you cultivate over time, which causally influences how people react to you. Seems pretty analogous to the job candidate case.
if CDT agents often modify themselves to become an LDT/FDT agent then it would broadly seem accurate to say that CDT is getting outcompeted
See my replies to interstice’s comment—I don’t think “modifying themselves to become an LDT/FDT agent” is what’s going on, at least, there doesn’t seem to be pressure to modify themselves to do all the sorts of things LDT/FDT agents do. They come apart in cases where the modification doesn’t causally influence another agent’s behavior.
(This seems analogous to claims that consequentialism is self-defeating because the “consequentialist” decision procedure leads to worse consequences on average. I don’t buy those claims, because consequentialism is a criterion of rightness, and there are clearly some cases where doing the non-consequentialist thing is a terrible idea by consequentialist lights even accounting for signaling value, etc. It seems misleading to call an agent a non-consequentialist if everything they do is ultimately optimizing for achieving good consequences ex ante, even if they adhere to some rules that have a deontological vibe and in a given situation may be ex post suboptimal.)
Attempting to cultivate a habit is not the same as directly being that kind of person. The distinction may seem slight, but it’s worth keeping track of.
You make a very interesting point regarding: “the act of seriously considering the antisocial action is what causes you to make the microexpression that makes others not trust you”.
However, I’m more skeptical of your claim, “managing your own “poker face” is just about causally manipulating your expressions so that you can send signals that profit you”. This is something that you can try to do. You can try to intentionally think about checking to try convince them you’re not going to fold, but this isn’t the exact same thing as if you definitely weren’t going to fold. It’s true that you usually have some additional causal levers, but none of them are the exact same as be the kind of person who does X.
Regarding outperforming CDT, if CDT agents often modify themselves to become an LDT/FDT agent then it would broadly seem accurate to say that CDT is getting outcompeted.
My guess would be that agents self-modifying themselves into such agents would be the primary way MIRI folks expect we’d end up with LDT/FDT agents[1]. I tend to be more interested in understanding these kinds of decision theory problems in and of themselves, and not just as something an AI modifies itself to be able to handle, but I feel like I’m taking something of a minority position here.
Okay son-of-CDT, but this is typically just a technicality.
Not really, the thing that adopts a decision theory probably didn’t have a clear position on adhering to CDT before that. Some spiritual successor to FDT could be the first clear resolution on its behavior that’s decision theory shaped.
Good point.
Thanks!
Not sure I understand. It seems like “being the kind of person who does X” is a habit you cultivate over time, which causally influences how people react to you. Seems pretty analogous to the job candidate case.
See my replies to interstice’s comment—I don’t think “modifying themselves to become an LDT/FDT agent” is what’s going on, at least, there doesn’t seem to be pressure to modify themselves to do all the sorts of things LDT/FDT agents do. They come apart in cases where the modification doesn’t causally influence another agent’s behavior.
(This seems analogous to claims that consequentialism is self-defeating because the “consequentialist” decision procedure leads to worse consequences on average. I don’t buy those claims, because consequentialism is a criterion of rightness, and there are clearly some cases where doing the non-consequentialist thing is a terrible idea by consequentialist lights even accounting for signaling value, etc. It seems misleading to call an agent a non-consequentialist if everything they do is ultimately optimizing for achieving good consequences ex ante, even if they adhere to some rules that have a deontological vibe and in a given situation may be ex post suboptimal.)
Attempting to cultivate a habit is not the same as directly being that kind of person. The distinction may seem slight, but it’s worth keeping track of.