The Machine Learning Personality Test

You’ve probably heard of the Briggs-Myers personality test, which is a classification system of 16 different personality types based on the writings of Carl Jung, a man who believed that his library books sometimes spontaneously exploded. Its main advantage is that it manages to classify people without insulting them. (This is accomplished by confounding dimensions: Instead of measuring one property of personality along one dimension, which leads to some scores being considered better than others, you subtract a measurement along one desirable property of personality from a measurement along another desirable property of personality, and call the result one dimension.)

You’ve probably also heard of the MMPI, a test designed by giving lots of questions to mental patients and seeing which ones were answered differently by people with particular diagnoses. This is more like personality clustering for fault diagnosis than a personality test. You may find it useful if you’re crazy. (One of the criticisms of this test is that religious people often test as psychotic: “Do you sometimes think someone else is directing your actions? Is someone else trying to plan events in your life?” Is that a bug, or a feature?)

You may have heard of the Personality Assessment Inventory, a test devised by listing things that psychotherapists thought were important, and trying to come up with questions to test them.

The Big 5 personality test is constructed in a well-motivated way, using factor analysis to try to discover from the data what the true dimensions of personality are.

But these all work from the top down, looking at human behavior (answers), and trying to uncover latent factors farther down. I’m instead going to propose a personality system that, instead, starts from the very bottom of your hardware and leaves it to you to work your way up to the variables of interest: the Machine Learning Personality Test (“MLPT”).

Other personality tests try to measure things that people want to measure, but that might not be psychologically real. The MLPT is just the opposite: It tries to measure things that are probably psychologically real, but are at such a low level that people aren’t interested in them. Your mission, should you choose to accept it, is to figure out the connection between the dimensions of the MLPT, and personality traits that you understand and care about.

LW readers are familiar with thinking of people as optimizers. Take that idea, and make 3 assumptions:

  1. People optimize using something like existing algorithms for machine learning.

  2. A person learns parameters for their learning algorithms according to the data they are exposed to.

  3. These parameters generalize across tasks.

Assumption 1 is critical for the MLPT to make any sense. What it does is to classify people according to the parameter settings they use when learning and optimizing. I mostly use parameters from classification /​ categorization algorithms.

Assumption 2 is important if you wish to change yourself. This is the great advantage of the MLPT: It not only tells you your personality type, but also how to change your personality type. Simply expose yourself to data such that the MLPT type you desire is more effective than yours at learning that data.

Assumption 3 is something I have no evidence for at all, and may be wholly false.

Here are the dimensions I’ve thought of. Can you name others worth adding?

Learning rate: This is a parameter used to say how much you change your weights in response to new information. I was going to say, “how much you change your beliefs”, but that would be misleading; because we’re talking about a much finer level of detail. In a neural network model, the learning parameter determines how much you change the weight on a connection between 2 neurons each time you want to change the degree to which one of those neuron’s output affects the other neuron’s input.

People with a high learning rate learn fast and easily, and may be great at memorizing facts. But when it comes to problems where you have a lot of data and are trying to get extremely high performance, they are not able to get as good an optimum. This suggests that expert violinists or baseball players tend to have poor memory and be categorized as slow learners. (Although I’m skeptical that learning rate on motor tasks would generalize to learning rate on history exams.)

Regularization rate: This parameter says how strongly to bias your outcome back towards your priors. If your regularization rate isn’t high enough, the parameters you learn may drift to absurdly large values. In some cases, this will cause the entire network to become unstable, at which point learning ceases and you need to be rebooted.

In most ways, regularization is opposed to learning. Increasing the regularization rate without changing the learning rate effectively decreases the learning rate.

People with a high regularization rate might be less prone to mental illness, but not very creative. People with a low regularization rate will get some of the advantages of a high learning rate, without the disadvantages.

Exploration/​Exploitation setting: High exploration means you try out new solutions and new things often. High exploitation means you don’t. High exploitation is conceptually a lot like high regularization.

Number of dimensions to classify on: When you’re learning how to categorize something, how many dimensions do you use? An astonishing percentage of what we do is based on single-dimension discriminations. Some people use only a single dimension even for important and highly complex discrimination tasks, such as choosing a new president, or deciding on the morality of an action.

Using a small number of dimensions results in a high error rate (where “error”, since I’m not assuming category labels exist out in the world, is going to mean your error in predicting outcomes based on your categorizations). Using a large number of dimensions results in slow learning and slow thinking, construction of categories no one else understands, stress when faced with complex situations, and errors from overgeneralizing and from perceiving patterns where there are none, because you don’t have enough data to learn whether a distinction in outcome is really due to a difference along one of your dimensions, or just chance.

People using too few dimensions will be, well, stupid. They will be incapable of learning many things no matter how much data they’re exposed to. But they can make decisions under pressure and with confidence. They may make good managers. People using too many dimensions will take too long to make decisions, wanting too much data. This dimension may correspond closely to “intelligence”, of the kind that scores well on IQ tests.

People using different dimensions and different numbers of dimensions will have a very hard time understanding each other.

It may be worth breaking this separately into number of input dimensions and number of output dimensions. But I kinda doubt it. (I guess I’m just a low-dimensional kinda guy.)

Binary /​ Discrete /​ Continuous thinking: Do you categorize your inputs before thinking about them, or try to juggle all their values and do regression in your head? Are you trying to put things in bins, or place them along a continuum?

This probably has the same implications as number of input and output dimensions.

Degree of independence/​correlation assumed to exist between dimensions: If the things you are categorizing have measurements along different dimensions that are independent on different dimensions, categorization becomes much easier, and you can handle many more dimensions.

People assuming high independence might make good scientists, as science has so far been the art of finding dimensions in the real world that are independent and using them for analysis. People assuming high correlations might be better at art, and at perceiving holistic patterns. They might tend to give credence towards New-Age notions that everything is interconnected.

Degree of linearity/​nonlinearity assumed: Assuming linearity has similar advantages and disadvantages as assuming independence, and assuming nonlinearity has similar effects to assuming correlations. (They are not the same; sometimes the real world presents linearity with correlations, or independence with nonlinearity. I just can’t think of anything different to say about them personality-wise.)

I’m going to merge independence/​correlation and linearity/​nonlinearity, because I don’t have anything useful to say to distinguish them. I’m going to merge regularization rate and exploration/​exploitation for similar reasons; those two are a lot like each other anyway. I’m going to ignore binary/​discrete/​continuous, because I didn’t think of it until after writing the personality types below and I’m too lazy to redo them. It’s a lot like number of dimensions anyway.

Now we need to find cute acronyms for our resulting personality types. For this, we will organize our dimensions so that the first and last dimensions are specified with vowels, and the second and third by consonants. (Changing the fourth letter to a vowel and thus providing catchier names is, I think, the main advantage of this test over the Myers-Briggs.)

  • Regularization rate: high = (I)nertial, low = (U)nconventional

  • Learning rate: high = (F)ast /​ (S)low

  • Number of dimensions: (M)any /​ (F)ew

  • Independence /​ linearity: (I)ndependent and linear /​ h(O)listic and nonlinear

Now you may be eager to take the MLPT and find your results!

Sadly, it does not exist. As I said, I’m just proposing it.

But we can at least write fun, horoscope-like personality summaries! (NOTE: These may not be as accurate as an actual horoscope.)

  • IFMI: You like things that appear complex, but can be mastered with a few fundamental rules. You may become an engineer.

  • ISMI: Like IFMI, but you were on the chess team instead of “It’s Academic”.

  • IFMO: You should go to med school.

  • IFFI: You know what you like, and what others should like. You thought four dimensions was too many. You may vote Republican.

  • ISMO: You may be a go master.

  • UFMI, USMI: You over-analyze everything, often arriving at unconventional answers, and this makes you a pain in the ass to those around you. You probably read Less Wrong.

  • USMO: You are very artistic. You don’t believe in personality classification schemes. You may have been to Taos, New Mexico.