The machine learning metaphor you want is the distinction between supervised learning and unsupervised learning. Supervised learning is when someone hands you a bunch of pictures and labels them as being either cats or dogs and it’s your job to infer future cat-dog labels. Unsupervised learning is when someone hands you a bunch of pictures and it’s your job to infer that they can be separated into two clumps which, if you handed them to a human, the human might say “those are cats and those are dogs” (but maybe it’s more complicated than that).
The simplest subtype of unsupervised learning is clustering, where you only get a bunch of unlabeled data points and it’s your job to organize them into clusters, which you might loosely map onto buckets. Roughly speaking, there are three sorts of things that can happen to your clusters as you get more data points, namely
A data point appears which is so far away from your other clusters that you need a new cluster for it,
Something you thought was one cluster gets broken into two clusters, or
Something you thought was two clusters gets merged into one cluster.
So far this metaphor / model doesn’t have or need a notion of the “name” of a cluster, which is more complicated. Anyway, this seems like as good a place as any for a starting point for the rationality development needed here.
The machine learning metaphor you want is the distinction between supervised learning and unsupervised learning. Supervised learning is when someone hands you a bunch of pictures and labels them as being either cats or dogs and it’s your job to infer future cat-dog labels. Unsupervised learning is when someone hands you a bunch of pictures and it’s your job to infer that they can be separated into two clumps which, if you handed them to a human, the human might say “those are cats and those are dogs” (but maybe it’s more complicated than that).
The simplest subtype of unsupervised learning is clustering, where you only get a bunch of unlabeled data points and it’s your job to organize them into clusters, which you might loosely map onto buckets. Roughly speaking, there are three sorts of things that can happen to your clusters as you get more data points, namely
A data point appears which is so far away from your other clusters that you need a new cluster for it,
Something you thought was one cluster gets broken into two clusters, or
Something you thought was two clusters gets merged into one cluster.
So far this metaphor / model doesn’t have or need a notion of the “name” of a cluster, which is more complicated. Anyway, this seems like as good a place as any for a starting point for the rationality development needed here.
Ugh, this comment is also on the wrong post; it’s supposed to be a comment to Soft Priors.