I fear this misses an important reason why new work is needed on concept learning for superintelligent agents: straightforward clustering is not necessarily a good tool for concept learning when the space of possible actions is very large, and the examples and counterexamples cannot cover most of it.
To take a toy example from this post, imagine that we have built an AI with superhuman engineering ability, and we would like to set it the task of making us a burrito. We first present the AI with millions of acceptable burritos, along with millions of unacceptable burritos and objects that are not burritos at all. We then ask it to build us things that are more like the positive examples than like the negative examples.
I claim that this is likely to fail disastrously if it evaluates likeness by straightforward clustering in the space of observables it can scan about the examples. All our examples and counterexamples lie on the submanifold of “things we (and previous natural processes) are able to build”, which has high codimension in the manifold of “things the AI is able to build”.
A burrito with a tiny self-replicator nanobot inside, for instance, would cluster closer to all of the positive examples than to all of the negative examples, since there are no tiny self-replicating nanobots in any of the examples or counterexamples, and in all other respects it matches the examples better. (Or a toxic molecule that has never before occurred in nature or been built by humans, etc.)
The sense in which those would be poor attempts to learn the concept are simply not captured by straightforward clustering, and it’s not enough to say that we should try non-parametric models, we would need to think about how a non-parametric model might do this well. (Here’s an example of a parametric learner which tries to confront this problem.)
A key part of the idea (which, again, I think has some fatal flaws) was that concepts are clusters within some representation of the world, which is learned unsupervised, and is in some sense good at predicting the world. One way to think of this representation is as a set of features whose activity levels parsimoniously describe the data about each example. This requires that a disproportionate fraction of the space of feature activations maps close to the manifold that the examples lie on in the space of raw data.
Of course, you have to choose which features to cluster over, which requires some Bayesian tradeoff between getting a tight fit to the examples (high likelihood) and simplicity of the features (high prior) (clearly I just finished Kaj’s linked paper). But overall I think that unsupervised feature learning is tackling almost exactly the problem you pointed out.
In practice, there might be some problems. A potent toxin or a self-replicating nanobot are bad because they cause harm to whatever eats it, but would even a superintelligence learn a feature to detect safety to humans if all it saw of the universe was one million high-resolution scans of burritos? Well, maybe. But I’d trust it more if it also got to observe the context and consequences of burrito-consumption.
--
Anyhow, I agree with you that “be non-parametric!” is not necessarily helpful advice for producing safe burritos. The claim I put forward in the last paragraphs is that if you represent the agent’s goals non-parametrically in terms of examples, in the most obvious way, we seem to avoid some problems with improving the agent’s ontology.
I fear this misses an important reason why new work is needed on concept learning for superintelligent agents: straightforward clustering is not necessarily a good tool for concept learning when the space of possible actions is very large, and the examples and counterexamples cannot cover most of it.
To take a toy example from this post, imagine that we have built an AI with superhuman engineering ability, and we would like to set it the task of making us a burrito. We first present the AI with millions of acceptable burritos, along with millions of unacceptable burritos and objects that are not burritos at all. We then ask it to build us things that are more like the positive examples than like the negative examples.
I claim that this is likely to fail disastrously if it evaluates likeness by straightforward clustering in the space of observables it can scan about the examples. All our examples and counterexamples lie on the submanifold of “things we (and previous natural processes) are able to build”, which has high codimension in the manifold of “things the AI is able to build”.
A burrito with a tiny self-replicator nanobot inside, for instance, would cluster closer to all of the positive examples than to all of the negative examples, since there are no tiny self-replicating nanobots in any of the examples or counterexamples, and in all other respects it matches the examples better. (Or a toxic molecule that has never before occurred in nature or been built by humans, etc.)
The sense in which those would be poor attempts to learn the concept are simply not captured by straightforward clustering, and it’s not enough to say that we should try non-parametric models, we would need to think about how a non-parametric model might do this well. (Here’s an example of a parametric learner which tries to confront this problem.)
A key part of the idea (which, again, I think has some fatal flaws) was that concepts are clusters within some representation of the world, which is learned unsupervised, and is in some sense good at predicting the world. One way to think of this representation is as a set of features whose activity levels parsimoniously describe the data about each example. This requires that a disproportionate fraction of the space of feature activations maps close to the manifold that the examples lie on in the space of raw data.
Of course, you have to choose which features to cluster over, which requires some Bayesian tradeoff between getting a tight fit to the examples (high likelihood) and simplicity of the features (high prior) (clearly I just finished Kaj’s linked paper). But overall I think that unsupervised feature learning is tackling almost exactly the problem you pointed out.
In practice, there might be some problems. A potent toxin or a self-replicating nanobot are bad because they cause harm to whatever eats it, but would even a superintelligence learn a feature to detect safety to humans if all it saw of the universe was one million high-resolution scans of burritos? Well, maybe. But I’d trust it more if it also got to observe the context and consequences of burrito-consumption.
--
Anyhow, I agree with you that “be non-parametric!” is not necessarily helpful advice for producing safe burritos. The claim I put forward in the last paragraphs is that if you represent the agent’s goals non-parametrically in terms of examples, in the most obvious way, we seem to avoid some problems with improving the agent’s ontology.