I think it’s right that the distinction “lots of data” and “less data” doesn’t really carve reality at its natural joints. I feel like your distinction between “discrete” and “continuous” X also doesn’t fully do this since you could imagine a case of discrete X where we have only one y for each x in the dataset, and thus need regression, too (at least, in principle).
I think the real distinction is probably whether we have “several y’s for each x” in the dataset, or not. The twin dataset case has that, and so even though it’s not a lot of data (only 32 pairs, or 64 total samples), we can essentially apply what I called the “lots of data” case.
Now, I have to admit that by this point I’m somewhat attached to the imperfect state of this post and won’t edit it anymore. But I’ve strongly upvoted your comment and weakly agreed with it, and I hope some confused readers will find it.
Thanks for the comment Stepan!
I think it’s right that the distinction “lots of data” and “less data” doesn’t really carve reality at its natural joints. I feel like your distinction between “discrete” and “continuous” X also doesn’t fully do this since you could imagine a case of discrete X where we have only one y for each x in the dataset, and thus need regression, too (at least, in principle).
I think the real distinction is probably whether we have “several y’s for each x” in the dataset, or not. The twin dataset case has that, and so even though it’s not a lot of data (only 32 pairs, or 64 total samples), we can essentially apply what I called the “lots of data” case.
Now, I have to admit that by this point I’m somewhat attached to the imperfect state of this post and won’t edit it anymore. But I’ve strongly upvoted your comment and weakly agreed with it, and I hope some confused readers will find it.