I don’t think transitivity is a reasonable assumption.
Suppose an agent is composed of simpler submodules—this, to a very rough approximation, is how actual brains seem to function—and its expressed preferences (i.e. actions) are assembled by polling its submodules.
Neural signals represent things cardinally rather than ordinally, so those voting paradoxes probably won’t apply.
Even conditional on humans not having transitive preferences even in an approximate sense, I find it likely that it would be useful to come up with some ‘transativization’ of human preferences.
Agreed that there’s a good chance that game-theoretic reasoning about interacting submodules will be important for clarifying the structure of human preferences.
Neural signals represent things cardinally rather than ordinally
I’m not sure what you mean by this. In the general case, resolution of signals is highly nonlinear, i.e. vastly more complicated than any simple ordinal or weighted ranking method. Signals at synapses are nearly digital, though: to first order, a synapse is either firing or it isn’t. Signals along individual nerves are also digital-ish—bursts of high-frequency constant-amplitude waves interspersed with silence.
My point, though, is that it’s not reasonable to assume that transitivity holds axiomatically when it’s simple to construct a toy model where it doesn’t.
On a macro level, I can imagine a person with dieting problems preferring starving > a hot fudge sundae, celery > starving, and a hot fudge sundae > celery.
On a macro level, I can imagine a person with dieting problems preferring starving > a hot fudge sundae, celery > starving, and a hot fudge sundae > celery.
My experience is that this is generally because of a measurement problem, not a reflectively endorsed statement.
Well, it’s clearly pathological in some sense, but the space of actions to be (pre)ordered is astronomically big and reflective endorsement is slow, so you can’t usefully error-check the space that way. cf. Lovecraft’s comment about “the inability of the human mind to correlate all its contents”.
I don’t think it will do to simply assume that an actually instantiated agent will have a transitive set of expressed preferences. Bit like assuming your code is bugfree.
The agent is allowed to ask it’s submodules how they would feel about various gambles e.g. “Would you prefer B or a 50% probability of A and a 50% probability of C”. Equipped with this extra information a voting paradox can be avoided. This is because the preferences over gambles tell you not just which order the submodule would rank the candidates in, but quantitatively how much it cares about each of them.
Assuming the submodules are rational (which they had better be if we want the overall agent to be rational) then their preferences over gambles can be expressed as a utility function on the outcomes. So then the main agent can make its utility function a weighted sum of theirs. This avoids non-transitivity.
A preference order which says just what order the candidates come in is called an “ordinal utility function”.
A utility function that actually describes the relative values of the candidates is a “cardinal utility function”.
I don’t think transitivity is a reasonable assumption.
Suppose an agent is composed of simpler submodules—this, to a very rough approximation, is how actual brains seem to function—and its expressed preferences (i.e. actions) are assembled by polling its submodules.
Bam, voting paradox. Transitivity is out.
Neural signals represent things cardinally rather than ordinally, so those voting paradoxes probably won’t apply.
Even conditional on humans not having transitive preferences even in an approximate sense, I find it likely that it would be useful to come up with some ‘transativization’ of human preferences.
Agreed that there’s a good chance that game-theoretic reasoning about interacting submodules will be important for clarifying the structure of human preferences.
Neural signals represent things cardinally rather than ordinally
I’m not sure what you mean by this. In the general case, resolution of signals is highly nonlinear, i.e. vastly more complicated than any simple ordinal or weighted ranking method. Signals at synapses are nearly digital, though: to first order, a synapse is either firing or it isn’t. Signals along individual nerves are also digital-ish—bursts of high-frequency constant-amplitude waves interspersed with silence.
My point, though, is that it’s not reasonable to assume that transitivity holds axiomatically when it’s simple to construct a toy model where it doesn’t.
On a macro level, I can imagine a person with dieting problems preferring starving > a hot fudge sundae, celery > starving, and a hot fudge sundae > celery.
My experience is that this is generally because of a measurement problem, not a reflectively endorsed statement.
Well, it’s clearly pathological in some sense, but the space of actions to be (pre)ordered is astronomically big and reflective endorsement is slow, so you can’t usefully error-check the space that way. cf. Lovecraft’s comment about “the inability of the human mind to correlate all its contents”.
I don’t think it will do to simply assume that an actually instantiated agent will have a transitive set of expressed preferences. Bit like assuming your code is bugfree.
The agent is allowed to ask it’s submodules how they would feel about various gambles e.g. “Would you prefer B or a 50% probability of A and a 50% probability of C”. Equipped with this extra information a voting paradox can be avoided. This is because the preferences over gambles tell you not just which order the submodule would rank the candidates in, but quantitatively how much it cares about each of them.
Assuming the submodules are rational (which they had better be if we want the overall agent to be rational) then their preferences over gambles can be expressed as a utility function on the outcomes. So then the main agent can make its utility function a weighted sum of theirs. This avoids non-transitivity.
A preference order which says just what order the candidates come in is called an “ordinal utility function”.
A utility function that actually describes the relative values of the candidates is a “cardinal utility function”.