Update: 12th June 2025 - Just came across this Astral Codex Ten post that covers probably 80% of the same ground, but to a different conclusion: that investigating the painfully obvious may uncover a non-causal heuristic that we take for a universal truth; whereas what I’m kind of wondering the opposite—knowing the heuristic is just a imperative written on a rock, and still using it because the margin of risk/saftey is acceptable.
I’m sure there is a word already (potentially ‘to pull a Homer’?) but Claude suggested the name “Paradoxical Heuristic Effectiveness” for situations where a non-causal rule or heuristic outperforms a complicated causal model.
I first became aware of this idea when I learned about the research of psychologist John Gottman who claims he has identified the clues which with 94% accuracy will determine if a married couple will divorce. Well, according to this very pro-Gottman webpage, 67% of all couples will divorce within 40 years. (According to Forbes, it’s closer to 43% of American couples that will end in divorce, but that rockets up to 70% for the third marriage).
A slight variation where a heuristic performs almost as well as a complicated model with drastically less computational cost, which I’ll call Paradoxical Heuristic Effectiveness: I may not be able to predict with 94% accuracy whether a couple will divorce, but I can with 57% accuracy: it’s simple, I say uniformly “they won’t get divorced.” I’ll be wrong 43% of the time. But unlike Gottman’s technique which requires hours of detailed analysis of microexpressions and playing back video tapes of couples… I don’t need to do anything. It is ‘cheap’, computationally both in terms of human computation or even in terms of building spreadsheets or even MPEG-4 or other video encoding and decoding of videos of couples.
My accuracy, however, rockets up to 70% if I can confirm they have been married twice before. Although this becomes slightly more causal.
Now, I don’t want to debate the relative effectiveness of Gottman’s technique, only the observation that his 94% success rate seems much less impressive than just assuming a couple will stay together. I could probably achieve a similar rate of accuracy through simply ascertaining a few facts: 1. How many times, if ever either party have been divorced before? 2. Have they sought counseling for this particular marriage? 3. Why have they sought counseling?
Now, these are all causally relevant facts. What is startling about by original prediction mechanism is just assuming that all couples will stay together is that it is arbitrary. It doesn’t rely on any actual modelling or prediction which is what makes it so computationally cheap.
I’ve been thinking about this recently because of a report of someone merging two text encoder models together T5xxl and T5 Pile: the author claims to have seen an improvement in prompt adherence for their Flux (and image generation model), another redditor opines is within the same range of improvement one would expect from merging random noise to the model.
The exploits of Timothy Dexter appear to be a real world example of Paradoxical Heuristic Effectiveness, as the story goes he was trolled into “selling coal to Newcastle” a proverb for an impossible transaction as Newcastle was a coal mining town – yet he made a fortune because of a serendipitous coal shortage at the time.
To Pull a Homer is a fictional idiom coined in an early episode of the Simpsons where Homer Simpson twice averts a meltdown by blindly reciting “Eeny, meeny, miny, moe” and happening to land on the right button on both occasions.
However, Dexter and Simpson appear to be examples of unknowingly find a paradoxically effective heuristic with no causal relationship to their success – Dexter had no means of knowing there was a coal shortage (nor apparently understood Newcastle’s reputation as a coal mining city) nor did Simpson know the function of the button he pushed.
Compare this to my original divorce prediction heuristic with a 43% failure rate: I am fully aware that there will be some wrong predictions but on the balance of probabilities it is still more effective than the opposite – saying all marriages will end in divorce.
Nicholas Nassim Taleb gives an alternative interpretation of the story of Thales as the first “option trader” – Thales is known for making a fantastic fortune when he bought the rights to all the olive presses in his region before the season, there being a bumper crop which made them in high demand. Taleb says this was not because of foresight or studious studying of the olive groves – it was a gamble that Thales as an already wealthy man was well positioned to take and exploit – after all, even a small crop would still earn him some money from the presses.
But is this the same concept as knowingly but blindly adopting a heuristic, which you as the agent know has no causal reason for being true, but is unreasonably effective relative to the cost of computation?
Update: 12th June 2025 - Just came across this Astral Codex Ten post that covers probably 80% of the same ground, but to a different conclusion: that investigating the painfully obvious may uncover a non-causal heuristic that we take for a universal truth; whereas what I’m kind of wondering the opposite—knowing the heuristic is just a imperative written on a rock, and still using it because the margin of risk/saftey is acceptable.
I’m sure there is a word already (potentially ‘to pull a Homer’?) but Claude suggested the name “Paradoxical Heuristic Effectiveness” for situations where a non-causal rule or heuristic outperforms a complicated causal model.
I first became aware of this idea when I learned about the research of psychologist John Gottman who claims he has identified the clues which with 94% accuracy will determine if a married couple will divorce. Well, according to this very pro-Gottman webpage, 67% of all couples will divorce within 40 years. (According to Forbes, it’s closer to 43% of American couples that will end in divorce, but that rockets up to 70% for the third marriage).
A slight variation where a heuristic performs almost as well as a complicated model with drastically less computational cost, which I’ll call Paradoxical Heuristic Effectiveness: I may not be able to predict with 94% accuracy whether a couple will divorce, but I can with 57% accuracy: it’s simple, I say uniformly “they won’t get divorced.” I’ll be wrong 43% of the time. But unlike Gottman’s technique which requires hours of detailed analysis of microexpressions and playing back video tapes of couples… I don’t need to do anything. It is ‘cheap’, computationally both in terms of human computation or even in terms of building spreadsheets or even MPEG-4 or other video encoding and decoding of videos of couples.
My accuracy, however, rockets up to 70% if I can confirm they have been married twice before. Although this becomes slightly more causal.
Now, I don’t want to debate the relative effectiveness of Gottman’s technique, only the observation that his 94% success rate seems much less impressive than just assuming a couple will stay together. I could probably achieve a similar rate of accuracy through simply ascertaining a few facts: 1. How many times, if ever either party have been divorced before? 2. Have they sought counseling for this particular marriage? 3. Why have they sought counseling?
Now, these are all causally relevant facts. What is startling about by original prediction mechanism is just assuming that all couples will stay together is that it is arbitrary. It doesn’t rely on any actual modelling or prediction which is what makes it so computationally cheap.
I’ve been thinking about this recently because of a report of someone merging two text encoder models together T5xxl and T5 Pile: the author claims to have seen an improvement in prompt adherence for their Flux (and image generation model), another redditor opines is within the same range of improvement one would expect from merging random noise to the model.
The exploits of Timothy Dexter appear to be a real world example of Paradoxical Heuristic Effectiveness, as the story goes he was trolled into “selling coal to Newcastle” a proverb for an impossible transaction as Newcastle was a coal mining town – yet he made a fortune because of a serendipitous coal shortage at the time.
To Pull a Homer is a fictional idiom coined in an early episode of the Simpsons where Homer Simpson twice averts a meltdown by blindly reciting “Eeny, meeny, miny, moe” and happening to land on the right button on both occasions.
However, Dexter and Simpson appear to be examples of unknowingly find a paradoxically effective heuristic with no causal relationship to their success – Dexter had no means of knowing there was a coal shortage (nor apparently understood Newcastle’s reputation as a coal mining city) nor did Simpson know the function of the button he pushed.
Compare this to my original divorce prediction heuristic with a 43% failure rate: I am fully aware that there will be some wrong predictions but on the balance of probabilities it is still more effective than the opposite – saying all marriages will end in divorce.
Nicholas Nassim Taleb gives an alternative interpretation of the story of Thales as the first “option trader” – Thales is known for making a fantastic fortune when he bought the rights to all the olive presses in his region before the season, there being a bumper crop which made them in high demand. Taleb says this was not because of foresight or studious studying of the olive groves – it was a gamble that Thales as an already wealthy man was well positioned to take and exploit – after all, even a small crop would still earn him some money from the presses.
But is this the same concept as knowingly but blindly adopting a heuristic, which you as the agent know has no causal reason for being true, but is unreasonably effective relative to the cost of computation?