Eliezer’s Timeless Decision Theory solution to The Prisoner’s Dilemma is compelling.
It’s something I’ve thought about for a long time. There must be some solution to the bloody thing—my gut instinct tells me to cooperate, even when dealing with a paperclip maximizer, but all of my justifications wind up being little more than mathy ways of saying ‘Honor’. And to be perfectly frank, I’m not convinced that the story’s solution is much more than that either. Just replace “acts honorably” with “holds true to TDT”.
That said, I do hold myself to TDT, because to do otherwise would be dishonorable (honor being a part of my utility function)… but here I’m seeing a chicken-and-egg problem. Is ‘honor’ simply a manifestation of TDT?
I’m presuming that many of you have some thoughts on the matter, so I’m leaving my half-formed ideas here for comment.
Who’d’ve thunk they’d ever read a Harry Potter fanfic and enjoy it?
Well, it doesn’t actually add up to honor. If you’re in a True Prisoner’s Dilemma and you predict that the paperclipper will cooperate out of honor, TDT says to defect and reap the benefits. It’s only when two TDT agents meet that mutual cooperation is on the table.
(Nitpick: TDT and UDT should cooperate as well. Etc.)
EDIT: This comment is mistaken. If by HonorBot we mean an agent that predicts what the other agent will do, and then cooperates with all cooperators and defects against all defectors, then TDT indeed cooperates with HonorBot. TDT does not cooperate with CooperateBot, though, so TDT is not HonorBot.
Reputation effects are one way to change the payoffs so it’s no longer a Prisoner’s Dilemma. But if this particular interaction is more important than the reputation effects, TDT still defects against an honorable paperclipper who isn’t TDT or higher.
TDT says: I cooperate iff (you will cooperate iff I cooperate).
Honorable says: I cooperate iff you will cooperate.
It seems to me that, although Honorable is suboptimal if it meets an unconditional cooperator, TDT will cooperate with it because it meets the condition that TDT cares about.
Eliezer’s Timeless Decision Theory solution to The Prisoner’s Dilemma is compelling.
It’s something I’ve thought about for a long time. There must be some solution to the bloody thing—my gut instinct tells me to cooperate, even when dealing with a paperclip maximizer, but all of my justifications wind up being little more than mathy ways of saying ‘Honor’. And to be perfectly frank, I’m not convinced that the story’s solution is much more than that either. Just replace “acts honorably” with “holds true to TDT”.
That said, I do hold myself to TDT, because to do otherwise would be dishonorable (honor being a part of my utility function)… but here I’m seeing a chicken-and-egg problem. Is ‘honor’ simply a manifestation of TDT?
I’m presuming that many of you have some thoughts on the matter, so I’m leaving my half-formed ideas here for comment.
Who’d’ve thunk they’d ever read a Harry Potter fanfic and enjoy it?
Well, it doesn’t actually add up to honor. If you’re in a True Prisoner’s Dilemma and you predict that the paperclipper will cooperate out of honor, TDT says to defect and reap the benefits. It’s only when two TDT agents meet that mutual cooperation is on the table.
(Nitpick: TDT and UDT should cooperate as well. Etc.)
EDIT: This comment is mistaken. If by HonorBot we mean an agent that predicts what the other agent will do, and then cooperates with all cooperators and defects against all defectors, then TDT indeed cooperates with HonorBot. TDT does not cooperate with CooperateBot, though, so TDT is not HonorBot.
Only if you try to act honorably to the honorable and dishonorably to the dishonorable do you have something like TDT.
And you must do this in a way that makes you appear honorable to others who use the same algorithm.
Reputation effects are one way to change the payoffs so it’s no longer a Prisoner’s Dilemma. But if this particular interaction is more important than the reputation effects, TDT still defects against an honorable paperclipper who isn’t TDT or higher.
TDT says: I cooperate iff (you will cooperate iff I cooperate).
Honorable says: I cooperate iff you will cooperate.
It seems to me that, although Honorable is suboptimal if it meets an unconditional cooperator, TDT will cooperate with it because it meets the condition that TDT cares about.
On reflection, your conclusion is obviously right: playing PD against Honorbot is simply playing Newcomb’s Dilemma, so TDT cooperates.
I was misled by the recent realization that TDT doesn’t actually work out to “I cooperate iff (you will cooperate iff I cooperate)”.