I do not have a formal definition, but it’s the sort of thing I’m interested in.
In future posts I’d like to explore how I’m sorta talking about the distribution that exists in the actual data structures while gesturing at the idea of an idealized semantic space representing the natural phenomena being described. The natural phenomena and idealized semantic space are what I’m interested in with the actual data structures being a way to learn about that ideal space and with the motivation that understanding of the ideal space could be applied inside the domain of neural nets and machine learning and potentially applied in broader scientific/engineering domains directly.
Trying to formalize what I’m talking about would be a big part of that exploration.
I did describe this stuff in more detail in Zoom Out: Distributions in Semantic Spaces so you might want to read and comment there, but I’ll try to answer your questions a bit here.
By the “input space” and “output space” I am fuzzily both referring to the space of possible values that the data structure of the networks input and output can take, and also referring to the space of configurations of phenomena generating that data. I might call these the “digital space” and “ideal space” respectively.
So in the case of visual/image space, the digital space would be the set of possible rgb values, while the ideal space would be the space of possible configurations of cameras capturing images (or other ways of generating images). Although there are many more images in the digital space than look like anything other than static to people, the ideal space is actually a much larger space than is distinguishable by the resulting data structure, because, for example, I can aim a camera at a monitor and display any pattern of static on that monitor, or aim a different camera at a different monitor and generate all the same images of static. The same data resulting from different phenomena.
So you could think of the underlying sets being:
For digital space the underlying set is a nice clean finite digital set with cardinality two to the power of however many bits there are in your data structure.
For ideal space the underlying set is the incomprehensibly large set of possible phenomena.
I also have two topologies in mind. The topology I’m more interested in I might call the “semantic topology” which would have as open sets any semantically related objects. But I’m also thinking of the semantic topology as being approximated by sufficiently high dimensional spaces with the usual topology, although the semantic topology is probably coarser than the usual topology. But that is all very ungrounded speculation.
Wouldn’t the output space just be the interval [0,1]
That depends on the network architecture and training. I think it’s more natural to have [0,1]^2 with one dimension mapped to “likelihood of cat” and the other to “likelihood of dog”, rather than have some “cat-not-cat” classifier which might be predisposed to think dogs are even more not a cat than nothing at all. But you could train such a network and in that case, yes, the output space would be the interval [0,1].
But another consideration is whether the semantics you’re actually interested in span the entire input space. It’s very likely they do not, in which case it’s likely they also don’t span the output [0,1], but maybe [0.003, 0.998] or (0.1, pi/4) or some other arbitrary bound. This is quite certain in the case logits which get normalized by a softmax, in which case it would surprise me if the semantic distribution spanned from -infinity to infinity on any dimension.
and the input space [0,1]^N
My answer is essentially the same as the above with the exception that the digital space might be quite explicitly the entire [0,1]^N even if most of it is in an open set of the semantic topology linked by the semantic of “it’s a picture of static noise”.
I also note [0,1]^N has infinite resolution of colour variability between white and black. This is not true for actual pixels which have a large but finite set of possible values.
I think even formally defining what you want the underlying set of ideal space to be would be a good post.
I personally find the informal ideas you discuss in between the topology slop ( sorry :) ) to be far more interesting.
The topology I’m more interested in I might call the “semantic topology” which would have as open sets any semantically related objects.
It sounds like you want to suppose the existence of a “semantic distance”, which satisfies all the usual metric space axioms, and then use the metric space topology. And you want this “semantic distance” to somehow correspond to whether humans consider two concepts to be semantically similar.
An issue if you use the euclidean topology on the output space [0,1]^2, and a “semantic topology” on the input space, is that your network won’t be continuous by default anymore. The inverse image of an open set would be open in the euclidean topology, but not necessarily open in the “semantic topology”. You could define the topology on the output space so that by definition the network is continuous (quotient topology) but then topology really is getting you nothing.
I am interested in semantic distance, but before that I am interested in semantic continuity. I think the idealized topology wouldn’t have a metric, but the geometric spaces in which that topology is embedded gives it semantic distance, implicitly giving it a metric.
For example, in image space slight changes in lighting would give small distances, but translating or rotating an image would move it a very great distance away. So the visual space is great for humans to look at, but the semantic metric describes things about pixel similarity that we usually don’t care about outside of computer algorithms.
The labelling space would have a much more useful metric. Assuming a 1d logit, distance would correspond to how much something does or does not seem like a cat. With 2d or more logits the situation would become more complicated, but again, distance represents motion towards or away from confidence of whether we’re looking at a cat, a dog, or something else.
But in both cases, the metric is a choice that tells you something about certain kinds of semantics. I’m not confident there would exist a universal metric for semantic distance.
You could define the topology on the output space so that by definition the network is continuous (quotient topology) but then topology really is getting you nothing.
I’d actually be more inclined to do this. I agree it immediately gets you nothing, but it becomes more interesting when you start asking questions like “what are the open sets” and “what do the open sets look like in the latent spaces”.
Bringing back the cat identifier net, if I look at the set of high cat confidence, will the preimage be the set of all images that are definitely cats? I think that’s a common intuition, but could we prove it? Would there be a way to systematically explore diverse sections of that preimage to verify that they are indeed all definitely cats?
The fact that it’s starting from a trivial assertion doesn’t make it a bad place to start exploring imo.
I think that kinda direction might be what you’re getting at mentioning “informal ideas I discuss in between the topology slop”. So it’s true, I might stop thinking in terms of topology eventually, but for now I think it’s helping guide my thinking. I want to try to move towards thinking in terms of manifolds, and I think noticing the idea of semantic connectivity, ie, a semantic topological space, without requiring the idea of semantic distance is worthwhile.
I think that might be one of the ideas I’m trying to zero in on: The distributions in the data are always the same and what networks do is change from embedding that distribution in one geometry to embedding it in a different geometry which has different (more useful?) semantic properties.
I do not have a formal definition, but it’s the sort of thing I’m interested in.
In future posts I’d like to explore how I’m sorta talking about the distribution that exists in the actual data structures while gesturing at the idea of an idealized semantic space representing the natural phenomena being described. The natural phenomena and idealized semantic space are what I’m interested in with the actual data structures being a way to learn about that ideal space and with the motivation that understanding of the ideal space could be applied inside the domain of neural nets and machine learning and potentially applied in broader scientific/engineering domains directly.
Trying to formalize what I’m talking about would be a big part of that exploration.
I did describe this stuff in more detail in Zoom Out: Distributions in Semantic Spaces so you might want to read and comment there, but I’ll try to answer your questions a bit here.
By the “input space” and “output space” I am fuzzily both referring to the space of possible values that the data structure of the networks input and output can take, and also referring to the space of configurations of phenomena generating that data. I might call these the “digital space” and “ideal space” respectively.
So in the case of visual/image space, the digital space would be the set of possible rgb values, while the ideal space would be the space of possible configurations of cameras capturing images (or other ways of generating images). Although there are many more images in the digital space than look like anything other than static to people, the ideal space is actually a much larger space than is distinguishable by the resulting data structure, because, for example, I can aim a camera at a monitor and display any pattern of static on that monitor, or aim a different camera at a different monitor and generate all the same images of static. The same data resulting from different phenomena.
So you could think of the underlying sets being:
For digital space the underlying set is a nice clean finite digital set with cardinality two to the power of however many bits there are in your data structure.
For ideal space the underlying set is the incomprehensibly large set of possible phenomena.
I also have two topologies in mind. The topology I’m more interested in I might call the “semantic topology” which would have as open sets any semantically related objects. But I’m also thinking of the semantic topology as being approximated by sufficiently high dimensional spaces with the usual topology, although the semantic topology is probably coarser than the usual topology. But that is all very ungrounded speculation.
That depends on the network architecture and training. I think it’s more natural to have [0,1]^2 with one dimension mapped to “likelihood of cat” and the other to “likelihood of dog”, rather than have some “cat-not-cat” classifier which might be predisposed to think dogs are even more not a cat than nothing at all. But you could train such a network and in that case, yes, the output space would be the interval [0,1].
But another consideration is whether the semantics you’re actually interested in span the entire input space. It’s very likely they do not, in which case it’s likely they also don’t span the output [0,1], but maybe [0.003, 0.998] or (0.1, pi/4) or some other arbitrary bound. This is quite certain in the case logits which get normalized by a softmax, in which case it would surprise me if the semantic distribution spanned from -infinity to infinity on any dimension.
My answer is essentially the same as the above with the exception that the digital space might be quite explicitly the entire [0,1]^N even if most of it is in an open set of the semantic topology linked by the semantic of “it’s a picture of static noise”.
I also note [0,1]^N has infinite resolution of colour variability between white and black. This is not true for actual pixels which have a large but finite set of possible values.
I think even formally defining what you want the underlying set of ideal space to be would be a good post.
I personally find the informal ideas you discuss in between the topology slop ( sorry :) ) to be far more interesting.
It sounds like you want to suppose the existence of a “semantic distance”, which satisfies all the usual metric space axioms, and then use the metric space topology. And you want this “semantic distance” to somehow correspond to whether humans consider two concepts to be semantically similar.
An issue if you use the euclidean topology on the output space [0,1]^2, and a “semantic topology” on the input space, is that your network won’t be continuous by default anymore. The inverse image of an open set would be open in the euclidean topology, but not necessarily open in the “semantic topology”. You could define the topology on the output space so that by definition the network is continuous (quotient topology) but then topology really is getting you nothing.
I am interested in semantic distance, but before that I am interested in semantic continuity. I think the idealized topology wouldn’t have a metric, but the geometric spaces in which that topology is embedded gives it semantic distance, implicitly giving it a metric.
For example, in image space slight changes in lighting would give small distances, but translating or rotating an image would move it a very great distance away. So the visual space is great for humans to look at, but the semantic metric describes things about pixel similarity that we usually don’t care about outside of computer algorithms.
The labelling space would have a much more useful metric. Assuming a 1d logit, distance would correspond to how much something does or does not seem like a cat. With 2d or more logits the situation would become more complicated, but again, distance represents motion towards or away from confidence of whether we’re looking at a cat, a dog, or something else.
But in both cases, the metric is a choice that tells you something about certain kinds of semantics. I’m not confident there would exist a universal metric for semantic distance.
I’d actually be more inclined to do this. I agree it immediately gets you nothing, but it becomes more interesting when you start asking questions like “what are the open sets” and “what do the open sets look like in the latent spaces”.
Bringing back the cat identifier net, if I look at the set of high cat confidence, will the preimage be the set of all images that are definitely cats? I think that’s a common intuition, but could we prove it? Would there be a way to systematically explore diverse sections of that preimage to verify that they are indeed all definitely cats?
The fact that it’s starting from a trivial assertion doesn’t make it a bad place to start exploring imo.
I think that kinda direction might be what you’re getting at mentioning “informal ideas I discuss in between the topology slop”. So it’s true, I might stop thinking in terms of topology eventually, but for now I think it’s helping guide my thinking. I want to try to move towards thinking in terms of manifolds, and I think noticing the idea of semantic connectivity, ie, a semantic topological space, without requiring the idea of semantic distance is worthwhile.
I think that might be one of the ideas I’m trying to zero in on: The distributions in the data are always the same and what networks do is change from embedding that distribution in one geometry to embedding it in a different geometry which has different (more useful?) semantic properties.
Good luck with it. I do think the broad direction is pretty promising.
Thanks : )