There’s an X thread showing that the ordering of answer options is, in several cases, a stronger determinant of the model’s answer than its preferences. While this doesn’t invalidate the paper’s results—they control for this by varying the ordering of the answer results and aggregating the results—, this strikes me as evidence in favor of the “you are not measuring what you think you are measuring” argument, showing that the preferences are relatively weak at best and completely dominated by confounding heuristics at worst.
This doesn’t contradict the Thurstonian model at all. This only show order effects are one of the many factors going in utility variance, one of the factors of the Thurstonian model. Why should it be considered differently than any other such factor? The calculations still show utility variance (including order effects) decrease when scaled (Figure 12), you don’t need to eyeball based on a few examples in a Twitter thread on a single factor.
We responded to the above X thread, and we added an appendix to the paper (Appendix G) explaining how the ordering effects are not an issue but rather a way that some models represent indifference.
There’s an X thread showing that the ordering of answer options is, in several cases, a stronger determinant of the model’s answer than its preferences. While this doesn’t invalidate the paper’s results—they control for this by varying the ordering of the answer results and aggregating the results—, this strikes me as evidence in favor of the “you are not measuring what you think you are measuring” argument, showing that the preferences are relatively weak at best and completely dominated by confounding heuristics at worst.
This doesn’t contradict the Thurstonian model at all. This only show order effects are one of the many factors going in utility variance, one of the factors of the Thurstonian model. Why should it be considered differently than any other such factor? The calculations still show utility variance (including order effects) decrease when scaled (Figure 12), you don’t need to eyeball based on a few examples in a Twitter thread on a single factor.
Hey, first author here.
We responded to the above X thread, and we added an appendix to the paper (Appendix G) explaining how the ordering effects are not an issue but rather a way that some models represent indifference.