What would it mean for the Myers-Briggs personality test to be pseudoscientific?
I recently had a two day training course at work where they made a big fuss about Myers-Briggs personality tests, and ensuring that we learn to play to our strengths and identify weaknesses based on this test.
Looking it up after the course, I saw that Wikipedia’s view on it isn’t particularly positive:
The Myers–Briggs Type Indicator (MBTI) is a self-report questionnaire that makes pseudoscientific claims to categorize individuals into 16 distinct “personality types”.
Now Wikipedia’s probably right, and I’ve got better things to do than to dive into the research here. But I think possibly more important than whether or not the MBTI is pseudoscientific or not, is what would it mean for it to be pseudoscientific?
Once we make sure we’re asking the right questions, we can then find the right answers. But if we’re not asking the right questions, all our thinking on this is going to be confused.
A quick overview of the MBTI
An MBTI test asks a bunch of questions, e.g. “what word do you prefer: ‘planned’ or ‘spontaneous’?”. It then scores the answers across 4 axes:
E Extraversion-Introversion I
S Sensing-Intuition N
T Thinking-Feeling F
J Judgement-Perception P
Although you have a continuous score along each of these axes, it breaks them down into a binary choice based on a fixed threshold, to assign everybody to one of 16 buckets (e.g ENFJ).
It then provides descriptions of each of the 16 personality types, which are meant to be useful in helping yourself and others relate to you and how you think.
Each of these 4 axes, are broken down into 5 subaxes. E.g. the Extraversion-Introversion axes is broken down into:
Initiating–Receiving
Expressive–Contained
Gregarious–Intimate
Active–Reflective
Enthusiastic–Quiet
The total Extraversion-Introversion score is the average of these 5 factors.
Different ways for the MBTI to be more or less right/useful/accurate/scientific.
Retestability
If you take the MBTI two days apart, how closely do your scores match each other? What if we give you an amnesiac after the first test, so you don’t remember your answers, or you’re feeling much happier/more excited/calmer/etc. the second time you take the test? What about 5 weeks apart, or 5 years apart?
If it takes very little to push scores apart then the MBTI is mostly a measure of your current mood/state of mind. If it stays consistent over long periods of time then it’s more likely to be measuring something inherent to you.
Even if not inherent, the MBTI might still be useful as a measure of your current state of mind, or even that you have semi-consistent states of mind. For example, it could be that you’re always an INTP after you finish playing tennis, and that provides a useful lens for anyone who wants to interact with you on a Wednesday morning.
Note it’s possible that some axes/subaxes are retestable, and some aren’t, in which case parts of the MBTI might be inherent, and others are not.
How strongly do sub factors correlate with each other?
If the 5 subfactors for Extraversion-Introversion correlate with each other strongly, then it’s meaningful to combine them into a single factor. If not, then the MBTI might be measuring 20 different personality axes, but the 4 main ones should be ignored, as they don’t usefully abstract away the underlying complexity. Since the MBTI is so focused on the 16 personality types, this would cast serious doubt on the ability of the MBTI to be a useful predictive tool.
Is there any interesting structure in the distribution of scores across the 4 axes?
Imagine you plot the scores for a large number of individuals in a 4 dimensional scatterplot. Does it just look like the scores are distributed fairly randomly across all 4 axis so that the combined scatter plot looks roughly like a 4-sphere, or does more interesting substructures appear—e.g. that we see dense clusters of points within each of the 16 buckets, and then sparse gaps between clusters.
If we see such interesting structure, that implies the MBTI is carving reality at the joints. People genuinely fall into one of 16 buckets, and the binary division of each axis is justified.
If not the MBTI might still be useful—we often arbitrarily divide continuous categories into discrete ones to make modelling the world simpler, and people who are close to each other on the scatterplot are still likely to be similar. But we have to recognise then that the MBTI is in the map, not the territory, and doesn’t in any way correspond to some fundamental property about reality. It would be equally valid to carve each dimension into 3 categories, for a total of 81 personality types, and our choice to use 16 is just an attempt to get sufficient signal from the test whilst minimising complexity.
Does the MBTI have predictive power?
Imagine I tell three people to predict what a subject will do in a particular situation. I tell one of the people the correct MBTI for the subject, another an MBTI that is 50% correct, and the final one the opposite MBTI score.
Will the one with the correct score perform better than the other two? How much better? To the extent the MBTI has predictive power it’s useful, and to the extent it doesn’t it’s pointless, even if it fails/passes all the other tests.
Conclusion
I think this exercise is a useful one. Often people get into arguments about the validity of things without ever clarifying what they’re actually arguing about, and so the argument goes round in circles.
By stopping and thinking about exactly what you’re claiming, and what the alternatives are, it’s much easier to have a productive discussion.
Now if somebody claims that the MBTI is pseudoscientific, or incredibly useful, you can go through each of these 4 tests, and see where you agree or disagree. Then you can research the ones you disagree about in more depth. This of course is not limited to the MBTI.
I’ve read about the MBTI for a while. Not in extreme depth, but also not via the simplifications provided by corporate heads. In depth enough to understand the basics of Jungian psychology on which the MBTI is based, though. So what I will say is likely going to differ significantly from what you learned in this course.
So, the most important thing is, the (real) MBTI four letters do not represent extremes on four different axes. That they do is one such simplification.
The core of the Jungian hypothesis on personality is that there are eight distinct cognitive functions, that is, eight basic ways the mind processes and organizes external and internal information.
These eight cognitive functions form two opposite pairs: Sensing vs Intuition, and Thinking vs Feeling, any of which may operate in either an Extraverted or an Introverted mode. Notice that it isn’t that Introversion and Extroversion form an axis, but rather that, say, “Introverted Thinking” and “Extroverted Thinking” form two very distinct modes of Thinking, to the point they cannot be considered the same cognitive process at all.
Jung considered every person to have all eight cognitive functions operating in them, but at very different weights, with a dominant one. In his system, I’d be someone who’s using Introverted Thinking as my default cognitive function almost 24⁄7, only varying this when needed under specific circumstances. So, for him, there were eight personality types, depending on which cognitive function is dominant for every person.
Myers and Briggs studied his works on the topic, and thought it was incomplete. They hypothesized that specifying a single cognitive function as dominant wasn’t enough to properly describe how the person functions. In their view, it was also necessary to take into account the cognitive function used secondarily. In my case, the secondary function I use the most is Extraverted Intuition.
Hence, for Myers and Briggs, my personality is defined as being primarily an Introverted Thinker, who uses Extraverted Intuition to fill the gaps where Introverted Thinking doesn’t cut it. And that’s it.
What are the four letters then?
They’re a needlessly convoluted way to say the exact same thing.
In the MBTI system, the two letters in the middle inform what my two main cognitive functions are. Since I use Intuition and Thinking, they’re “NT”. But that doesn’t say which of these is my main cognitive function and which is the secondary, nor which is Introverted or which is Extroverted. That’s what the other two letters say. The “I” at the beginning informs that my main function, whether it’s the Thinking or the Intuition, is of the Introverted type. And the final letter finally informs whether that “I” applies to the “N” or to the “T”. In my case, the fourth letter would be “P”, meaning my main function is the “T” one, which thus is the one the “I” affects.
Yes, that’s completely nuts. It’d be much, much easier to use something like “IT/EN”.
And this brings another aspect of their system. They consider that the main and secondary cognitive functions always have opposite “-version”. Hence, by specifying that my main type, Thinking, is of the Introverted type, that automatically assumes the secondary one, Intuition, is Extraverted.
There are a few more details. Basically, the third and fourth most used cognitive functions come from the determination of the first two. In my case, my third and fourth most used cognitive functions would be, respectively, Introverted Sensing (opposite to the second), and Extraverted Feeling (opposite to the first). And the other four would fall behind at positions fifth to eighth. The full set is my so-called “cognitive stack”.
TL;DR then: the four letters are not axes, they’re a very, very confusing way to say that, from the eight cognitive functions Jung identified, I hierarchize them following this specific sequence of priorities. By default, most of the time, I use this one, and then the others with lower and lower priority, following that sequence. There are (presumably) 16 standard stacks, and maybe several non-standard pathological ones. And all that the four MBTI letters inform is which of the 16 cognitive stacks applies in my case.
This, fundamentally, is the reason why the MBTI doesn’t correlate well, or at all, with the Big Five: the MBTI has no axes in a traditional psychometric sense. It’s an ordinal hierarchy of preferred cognitive processes, not a cardinal set of values or a standard distribution.
And the easiest way, by far, to identify one’s MBTI is to simply read the detailed descriptions of the eight cognitive functions. One of them almost always pops up as “yeah, that’s how I think most of the time”, with another popping up as “yeah, I also use this one a lot, not as much as that one, but still a lot”, the other six being stuff one clearly rarely uses.
Now, is any of this scientific? I don’t know. I have read many attempts at determining this, but all of them assume the four letters represent four axes that can then be psychometrically evaluated, which absolutely has nothing to do with what Jung was talking about, and I’m not aware of any psychological study about the validity, or lack thereof, of his hypothesis about the eight cognitive functions themselves (maybe there are?), much less, assuming they’re valid, of Myers and Briggs specific assertion they almost always come in 16 stacks (maybe they do, maybe they don’t, maybe they vary over time, etc.).
For my own anecdotal case, I find Introverted Thinking coupled with Extraverted Intuition, as described by Jung, covers a lot of how I function. Not everything by far, but a lot. So it’s useful. More than that, I cannot really say.
Hope this helps!
EDIT: Correction on my third and fourth functions and other minor clarifications.
It seems to me that the appropriate way to psychometrically investigate this is to treat each cognitive function as its own factor to be measured.
Indeed. I imagine it’d have to happen in four steps:
As you say, investigate each cognitive function independently. They won’t show the kind of independency psychometrics prefers, since there are overlaps between the different functions, but it’d be a good start.
If that one proves robust, then investigate the axis between the introverted and extraverted modes of the four basic types. My hunch is these four axes would take the form of four bimodal distributions.
Then, if that one also proves robust, investigate the existence and distribution of stable stacks. There are 40,320 possible stacks considering all permutations of all eight functions. My hunch is we’d find a very long-tailed normal distribution, with a small number of common stacks in the ±98% range. Maybe those are the MBTI 16, maybe not.
And then, finally, if the “stacks exist” hypothesis proves valid, study them over long periods of time to observe whether they change, and how.
Taboo “pseudoscientific”. Here are some things we could ask instead —
How was this instrument created? Was it created through a process that looks like iterative hypothesis testing, or was it created through a process that looks like ostensible experts writing down their existing beliefs and elaborating on them — experiment, or authority?
Does it work? What do the promoters of this instrument claim that it will do for you? Do these claims bear out in reality? Are there reasons to relate the specifics of the instrument to those outcomes, or would a completely different instrument (different questions, different axes) be just as effective? (Here I’m thinking of the criticisms of acupuncture which claim that using a random map of acupuncture points yields the same effects as the “traditional” maps.)
Do these axes “cleave nature at the joints”? Do the resulting “types” represent clusters? What does principal component analysis, k-means clustering, or other data analysis have to say?
Why these specific axes and not others? What other axes were considered and rejected? Were they rejected because data doesn’t support them, or because they didn’t fit a preconceived notion, aesthetic, etc.?
Are any of the axes or types redundant with one another, and included not because they are supported by data or outcomes, but because they fit a preconceived notion, aesthetic, etc.?
To what extent does this instrument cohere with other approaches to the subject, especially ones that started from different theories? (Does it approach the same reality as others, albeit from a different direction?)
How is the instrument treated by those who do believe in it? How do they promote or defend it? Is it treated in a manner more resembling a medical diagnostic or other scientific instrument, or in a manner more resembling a practice like palm-reading or horoscopes?
Is this standard for the MBTI? I’ve never heard of an MBTI test doing this before—it reminds me more of e.g. NEO-PI-R.
The point described here applies well to MBTI-style tests where one is dichotomized based on continuous variation in individual traits, but I think it applies less well to Enneagram-style tests where one is discretized based on which of several traits on scores highest in.
This is for reasons I explained in LDSL: Realistically, the personality measurements are log scales of traits that follow a ~lognormal distribution. Thus variations near the extremes of the traits matter more than variations near the bulk of the traits, and classifying people by where they are extreme is more useful than classifying people by their normal variation.
The test we used did, I have very little further knowledge of the MBTI other than what was discussed on this course.
Somewhat serendipitously, Spencer Greenberg just released a Clearer Thinking podcast episode interviewing Colin DeYoung on this topic just a few days ago. Worth a listen, IMO.
https://podcast.clearerthinking.org/episode/298/colin-deyoung-are-personality-types-a-statistical-mirage/
You can use it to predict how they’d score on a big5 text, with a fair bit of variance unaccounted for. :P
Though see also: https://carcinisation.com/2020/07/04/the-ongoing-accomplishment-of-the-big-five/
I recently saw a Hank Green video (https://youtu.be/DpGU8NARX-s?t=558) where he gives a definition of pseudoscience that I thought was pretty good.
The basic idea is that science is the process by which we create evidence for something, and then have that evidence be challenged. Doing this repeatedly helps us notice that some structures are pretty good for creating good evidence (ie statistics, double blind trials, falsifiability, etc.). Pseudoscience is using these same structures to make what you’re doing look like science, without actually interacting with the “being challenged” part.
INTJ?