Eliezer Yudkowsky comments on Towards a New Decision Theory

Eliezer Yudkowsky 16 Aug 2009 23:18 UTC
3 points
0

By “analog of one-shot true PD” I meant any game where the Nash equilibrium isn’t Pareto-optimal. The two links in my last comment gave plenty of examples.

Suppose we have an indefinitely iterated PD with an unknown bound and hard-to-calculate but small probabilities of each round being truly unobserved. Do you call that “a game where the Nash equilibrium isn’t a Pareto optimum”? Do you think evolution has handled it by programming us to just defect?

I’ve done some informal psychological experiments to check human conformance with timeless decision theory on variants of the original Newcomb’s Problem, btw, and people who one-box on Newcomb’s Problem seem to have TDT intuitions in other ways. Not that this is at all relevant to the evolutionary dilemmas, which we seem to’ve been programmed to handle by being temptable, status-conscious, and honorable to variant quantitative degrees.

But programming an AI to cooperate with strangers on oneshot true PDs out of a human sense of honor would be the wrong move—our sense of honor isn’t the formal “my C iff (opponent C iff my C)”, so a TDT agent would then defect against us.

I just don’t see human evolution—status, temptation, honor—as being very relevant here. An AI’s decision theory will be, and should be, decided by our intuitions about logic and causality, not about status, temptation, and honor. Honor enters as a human terminal value, not as a decider of the structure of the decision theory.