On the role of values: values clearly do play some role in determining which abstractions we use. An alien who observes Earth but does not care about anything on Earth’s surface will likely not have a concept of trees, any more than an alien which has not observed Earth at all. Indifference has a similar effect to lack of data.
However, I expect that the space of abstractions is (approximately) discrete. A mind may use the tree-concept, or not use the tree-concept, but there is no natural abstraction arbitrarily-close-to-tree-but-not-the-same-as-tree. There is no continuum of tree-like abstractions.
So, under this model, values play a role in determining which abstractions we end up choosing, from the discrete set of available abstractions. But they do not play any role in determining the set of abstractions available. For AI/alignment purposes, this is all we need: as long as the set of natural abstractions is discrete and value-independent, and humans concepts are drawn from that set, we can precisely define human concepts without a detailed model of human values.
Also, a mostly-unrelated note on the airplane example: when we’re trying to “define” a concept by drawing a bounding box in some space (in this case, a literal bounding box in physical space), it is almost always the case that the bounding box will not actually correspond to the natural abstraction. This is basically the same idea as the cluster structure of thingspace and rubes vs bleggs. (Indeed, Bayesian clustering is directly interpretable as abstraction discovery: the cluster-statistics are the abstract summaries, and they induce conditional independence between the points in each cluster.) So I would interpret the airplanes exampe (and most similar examples in the legal system) not as a change in a natural concept, but rather as humans being bad at formally defining their natural concepts, and needing to update their definitions as new situations crop up. The definitions are not the natural concepts; they’re proxies.
However, I expect that the space of abstractions is (approximately) discrete. A mind may use the tree-concept, or not use the tree-concept, but there is no natural abstraction arbitrarily-close-to-tree-but-not-the-same-as-tree. There is no continuum of tree-like abstractions.
This doesn’t seem likely to me. Language is optimized for communicating ideas, but let’s take a simpler example than language: transmitting a 256x256 image of a dog or something, with a palette of 100 colors, and minimizing L2 error. I think that
The palette will be slightly different when minimizing L2 error in RGB space rather than HSL space
The palette will be slightly different when using a suboptimal algorithm (e.g. greedily choosing colors)
The palette will be slightly different when the image is of a slightly different dog
The palette will be slightly different when the image is of the same dog from a different angle
By analogy, shouldn’t concepts vary continuously with small changes in the system’s values, cognitive algorithms, training environment, and perceptual channels?
Another analogy: consider this clustering problem.
Different clustering algorithms will indeed find slightly different parameterizations of the clusters, slightly different cluster membership probabilities, etc. But those differences will be slight differences. We still expect different algorithms to cluster things in one of a few discrete ways—e.g. identifying the six main clusters, or only two (top and bottom, projected onto y-axis), or three (left, middle, right, projected onto x-axis), maybe just finding one big cluster if it’s a pretty shitty algorithm, etc. We would not expect to see an entire continuum of different clusters found, where the continuum ranges from “all six separate” to “one big cluster”; we would expect a discrete difference between those two clusterings.
On the role of values: values clearly do play some role in determining which abstractions we use. An alien who observes Earth but does not care about anything on Earth’s surface will likely not have a concept of trees, any more than an alien which has not observed Earth at all. Indifference has a similar effect to lack of data.
However, I expect that the space of abstractions is (approximately) discrete. A mind may use the tree-concept, or not use the tree-concept, but there is no natural abstraction arbitrarily-close-to-tree-but-not-the-same-as-tree. There is no continuum of tree-like abstractions.
So, under this model, values play a role in determining which abstractions we end up choosing, from the discrete set of available abstractions. But they do not play any role in determining the set of abstractions available. For AI/alignment purposes, this is all we need: as long as the set of natural abstractions is discrete and value-independent, and humans concepts are drawn from that set, we can precisely define human concepts without a detailed model of human values.
Also, a mostly-unrelated note on the airplane example: when we’re trying to “define” a concept by drawing a bounding box in some space (in this case, a literal bounding box in physical space), it is almost always the case that the bounding box will not actually correspond to the natural abstraction. This is basically the same idea as the cluster structure of thingspace and rubes vs bleggs. (Indeed, Bayesian clustering is directly interpretable as abstraction discovery: the cluster-statistics are the abstract summaries, and they induce conditional independence between the points in each cluster.) So I would interpret the airplanes exampe (and most similar examples in the legal system) not as a change in a natural concept, but rather as humans being bad at formally defining their natural concepts, and needing to update their definitions as new situations crop up. The definitions are not the natural concepts; they’re proxies.
This doesn’t seem likely to me. Language is optimized for communicating ideas, but let’s take a simpler example than language: transmitting a 256x256 image of a dog or something, with a palette of 100 colors, and minimizing L2 error. I think that
The palette will be slightly different when minimizing L2 error in RGB space rather than HSL space
The palette will be slightly different when using a suboptimal algorithm (e.g. greedily choosing colors)
The palette will be slightly different when the image is of a slightly different dog
The palette will be slightly different when the image is of the same dog from a different angle
By analogy, shouldn’t concepts vary continuously with small changes in the system’s values, cognitive algorithms, training environment, and perceptual channels?
The key there is “slightly different”.
Another analogy: consider this clustering problem.
Different clustering algorithms will indeed find slightly different parameterizations of the clusters, slightly different cluster membership probabilities, etc. But those differences will be slight differences. We still expect different algorithms to cluster things in one of a few discrete ways—e.g. identifying the six main clusters, or only two (top and bottom, projected onto y-axis), or three (left, middle, right, projected onto x-axis), maybe just finding one big cluster if it’s a pretty shitty algorithm, etc. We would not expect to see an entire continuum of different clusters found, where the continuum ranges from “all six separate” to “one big cluster”; we would expect a discrete difference between those two clusterings.