Each observation of snarks contains several pieces of information:
The time at which the snark was sighted. I will come back to this one very soon.
Several traits of the snark. I call them the phenotype of the snark.
The first two adjectives each have 5 possibilities. This gives 25 combinations of what I call the main phenotype.
The last three characteristics each have 3 possibilities, giving 27 combinations. I call this part the secondary phenotype.
Whether the snark was a Boojum. This last one is a bit messy because there are 2 booleans, but only 3 possibilities
either the boat didn’t check
or checked, and it wasn’t a Boojum
or checked, and it was a Boojum
Now that we know that, we could try to observe how each trait impacts the dataset. For example, the most common main phenotype is Hollow, yet Crisp. We could try to plot a histogram of the times at which each phenotype 0 appeared.
Aaaaand..… It’s obviously the mixture of two gaussians. This hints several things:
Snarks can be divided into species.
The density of sights for a given species is a gaussian.
Main phenotype 0 is made of 2 species. Maybe we could refine this observation by including secondary phenotype into the analysis. Maybe one can deduce or guess the species from the phenotype.
A few observations:
Big spoiler (but short)
This is a gaussian mixture
Big spoiler (but long)
Each observation of snarks contains several pieces of information:
The time at which the snark was sighted. I will come back to this one very soon.
Several traits of the snark. I call them the phenotype of the snark.
The first two adjectives each have 5 possibilities. This gives 25 combinations of what I call the main phenotype.
The last three characteristics each have 3 possibilities, giving 27 combinations. I call this part the secondary phenotype.
Whether the snark was a Boojum. This last one is a bit messy because there are 2 booleans, but only 3 possibilities
either the boat didn’t check
or checked, and it wasn’t a Boojum
or checked, and it was a Boojum
Now that we know that, we could try to observe how each trait impacts the dataset. For example, the most common main phenotype is Hollow, yet Crisp. We could try to plot a histogram of the times at which each phenotype 0 appeared.
Aaaaand..… It’s obviously the mixture of two gaussians. This hints several things:
Snarks can be divided into species.
The density of sights for a given species is a gaussian.
Main phenotype 0 is made of 2 species. Maybe we could refine this observation by including secondary phenotype into the analysis. Maybe one can deduce or guess the species from the phenotype.