Well in reality, if you are paying people on Mechanical Turk to classify your images, maybe you don’t want to sample randomly anyhow. Instead you could select maximally informative data points to ask them about.

This potentially helps with the problem of discovering the bounding region. Suppose that one of the features in the transformed space corresponds to shagginess. And suppose that the shaggiest image in our training set is an image of a dog. A naive learning algorithm might conclude that an image full of shag must be a dog. To deal with this problem, we set shagginess to 10, generate an image, and send it to MTurk. If they think it’s a dog, we double our shagginess. If they think it’s not a dog, we halve our shagginess. (For this use case, it might be best to ask them to describe the image in a single word… if they’re choosing between dog/cat/other, they might select dog on the basis that it looks kinda like dog hair or something like that.) Eventually we get some idea of where the classification boundary should be through binary search.

I’ll bet you could do some math to determine how to get the strongest statistical guarantees with the minimum amount of money spent on MTurk too.

I imagine that a latent space would let us do other cool things, like locate the edges of each class with some confidence

Yep. If the dog is represented using a convex polytope instead of a sphere, you might even reverse engineer the corners of your current classifier region, and then display them all to the user to show how expansive the classifier’s notion of “dog” is. But the map is not the territory: It’s possible that in some cases, the shape the user wants is actually concave.

However, I don’t think I’m familiar enough with them yet to speak confidently and technically about the subject (my class is just reaching autoencoders now).

I’m a deep learning noob too. I’m just about finished with Andrew Ng’s Coursera specialization, which was great, but the word “autoencoder” was never used. However there was some discussion of making use of transformed (“latent”? Staying on the safe side because I’m not familiar with that term) feature spaces. Apparently this is how face recognition systems recognize your face given only a single reference image: Map the reference image into a carefully constructed feature space, then map a new image of you in to the same feature space and compute the Euclidean distance. If the distance is small enough, it’s a match.

Instead you could select maximally informative data points to ask them about.

In this case, information is measured by how much of thingspace would be sheared if it turned out that a data point should be classified as ‘unknown’. It isn’t immediately clear how to find this without a tractable thingspace-volume-subroutine, but I think this would be computationally-efficient for both of our ideas.

I’ll bet you could do some math to determine how to get the strongest statistical guarantees with the minimum amount of money spent on MTurk too.

The technique you’re probably looking for is called Bayesian Optimization. Aside: at my school, ‘Optimization’ - not ‘Conspiracy’ - is unfortunately the word which most frequently follows ‘Bayesian’.

If the dog is represented using a convex polytope instead of a sphere, you might even reverse engineer the corners of your current classifier region, and then display them all to the user to show how expansive the classifier’s notion of “dog” is. But the map is not the territory: It’s possible that in some cases, the shape the user wants is actually concave.

Even an imperfect estimate of the volume would be useful: for example, perhaps we only find some of the edges and conclude the volume is some fraction of its true value. I have the distinct sense of talking past the point you were trying to make, though.

Even an imperfect estimate of the volume would be useful: for example, perhaps we only find some of the edges and conclude the volume is some fraction of its true value. I have the distinct sense of talking past the point you were trying to make, though.

Well in reality, if you are paying people on Mechanical Turk to classify your images, maybe you don’t want to sample randomly anyhow. Instead you could select maximally informative data points to ask them about.

This potentially helps with the problem of discovering the bounding region. Suppose that one of the features in the transformed space corresponds to shagginess. And suppose that the shaggiest image in our training set is an image of a dog. A naive learning algorithm might conclude that an image full of shag must be a dog. To deal with this problem, we set shagginess to 10, generate an image, and send it to MTurk. If they think it’s a dog, we double our shagginess. If they think it’s not a dog, we halve our shagginess. (For this use case, it might be best to ask them to describe the image in a single word… if they’re choosing between dog/cat/other, they might select dog on the basis that it looks kinda like dog hair or something like that.) Eventually we get some idea of where the classification boundary should be through binary search.

I’ll bet you could do some math to determine how to get the strongest statistical guarantees with the minimum amount of money spent on MTurk too.

Yep. If the dog is represented using a convex polytope instead of a sphere, you might even reverse engineer the corners of your current classifier region, and then display them all to the user to show how expansive the classifier’s notion of “dog” is. But the map is not the territory: It’s possible that in some cases, the shape the user wants is actually concave.

I’m a deep learning noob too. I’m just about finished with Andrew Ng’s Coursera specialization, which was great, but the word “autoencoder” was never used. However there was some discussion of making use of transformed (“latent”? Staying on the safe side because I’m not familiar with that term) feature spaces. Apparently this is how face recognition systems recognize your face given only a single reference image: Map the reference image into a carefully constructed feature space, then map a new image of you in to the same feature space and compute the Euclidean distance. If the distance is small enough, it’s a match.

In this case, information is measured by how much of thingspace would be sheared if it turned out that a data point should be classified as ‘unknown’. It isn’t immediately clear how to find this without a tractable thingspace-volume-subroutine, but I think this would be computationally-efficient for both of our ideas.

The technique you’re probably looking for is called Bayesian Optimization. Aside: at my school, ‘Optimization’ - not ‘Conspiracy’ - is unfortunately the word which most frequently follows ‘Bayesian’.

Even an imperfect estimate of the volume would be useful: for example, perhaps we only find some of the edges and conclude the volume is some fraction of its true value. I have the distinct sense of talking past the point you were trying to make, though.

No, that sounds more or less right.