My understanding of how the natural abstractions hypothesis relates to trained machine learning models and what the main challenge in actually applying it is:
How it applies:
A high-dimensional dataset from the real world such as MNIST contains “natural abstractions”, i.e. patterns that show up in a lot of places. If you train a machine learning model on the dataset, it will pick up all the sufficiently redundant information in the dataset (and often also a lot of non-redundant information), making it function as a sort of “summary statistic” of the data. Different systems trained on the same data would tend to pick up the same information, because they all have the same dataset and the information is part of the dataset, due to the redundancy.
How it is a challenge:
Think about, what format is it that the machine learning model gets the abstraction in? That is, if you want to get the information back out again, what way can you do it? Well, one guaranteed way is, for an image classifier, you can take the original images and feed them to the network, and it will output the classes according to the patterns it learned. Similarly, for a generative model, you can have it generate a dataset that resembles the one it was trained on.
These operations would obviously extract all of the information that the models have learned, but the operations are not very useful, as the extracted information is not in a format that is nice to get an overview of. So ideally we’d have a way to extract better structured information.
But it’s not obvious from the natural abstraction hypothesis that such a method exists. The natural abstraction hypothesis guarantees that the information is embedded in the models somehow, but it doesn’t seem like it guarantees that it’s embedded in some nice format, or that there is a nice way to extract it, beyond the way that it was put in.
I’m not sure whether you agree with this. My lesson if the above is true is that we need to think of ways to structure the models so as to put the information in a nicer format. But maybe I’m missing something. Maybe I need to re-digest the gKPD argument or something.
This basically correct, other than the part about not having any guarantee that the information is in a nice format. The Maxent and Abstractions arguments do point toward a relatively nice format, though it’s not yet clear what the right way is to bind the variables of those arguments to stuff in a neural net. (Though I expect the data structures actually used will have additional structure to them on top of the maxent form.)
My understanding of how the natural abstractions hypothesis relates to trained machine learning models and what the main challenge in actually applying it is:
How it applies:
A high-dimensional dataset from the real world such as MNIST contains “natural abstractions”, i.e. patterns that show up in a lot of places. If you train a machine learning model on the dataset, it will pick up all the sufficiently redundant information in the dataset (and often also a lot of non-redundant information), making it function as a sort of “summary statistic” of the data. Different systems trained on the same data would tend to pick up the same information, because they all have the same dataset and the information is part of the dataset, due to the redundancy.
How it is a challenge:
Think about, what format is it that the machine learning model gets the abstraction in? That is, if you want to get the information back out again, what way can you do it? Well, one guaranteed way is, for an image classifier, you can take the original images and feed them to the network, and it will output the classes according to the patterns it learned. Similarly, for a generative model, you can have it generate a dataset that resembles the one it was trained on.
These operations would obviously extract all of the information that the models have learned, but the operations are not very useful, as the extracted information is not in a format that is nice to get an overview of. So ideally we’d have a way to extract better structured information.
But it’s not obvious from the natural abstraction hypothesis that such a method exists. The natural abstraction hypothesis guarantees that the information is embedded in the models somehow, but it doesn’t seem like it guarantees that it’s embedded in some nice format, or that there is a nice way to extract it, beyond the way that it was put in.
I’m not sure whether you agree with this. My lesson if the above is true is that we need to think of ways to structure the models so as to put the information in a nicer format. But maybe I’m missing something. Maybe I need to re-digest the gKPD argument or something.
This basically correct, other than the part about not having any guarantee that the information is in a nice format. The Maxent and Abstractions arguments do point toward a relatively nice format, though it’s not yet clear what the right way is to bind the variables of those arguments to stuff in a neural net. (Though I expect the data structures actually used will have additional structure to them on top of the maxent form.)