Great point, some rambly thoughts on this: one way in which ontology identification could turn out to be like no-free lunch theorems is that we actually just get the correct translation by default. I.e. in ELK report terminology, we train a reporter using the naive baseline and get the direct translator. This seems related to Alignment by default, and I think of them the same way (i.e. “This could happen but seems very scary to rely on that without better arguments for why it should happen). I’d say one reason we don’t think much about no-free lunch theorems as a key obstacle to AI is that we’ve seen tons of cases where good generalization happens because the world is low entropy. I don’t think we’ve seen that kind of evidence for ontology identification not being a problem in practice. That said, “I think ontology identification will be easy, here’s why” is another valid response to the question from this post.
A related point would be “Should we think about ontology identification explicitly, or just work on other stuff and eventually solve it implicitly?” My first instinct is to directly tackle ontology identification, but I could see cases where a solution to ontology identification is actually easier to find from another lens. I do think though that that other lens will have to tackle a similarly difficult and central problem; just working on approaches that essentially assume away the ontology identification problem will very likely not lead to progress on ontology identification.
For examples, do you mean examples of thinking about ontology identification being useful to solve ontology identification, or examples of how a solution would be helpful for alignment?
For examples, do you mean examples of thinking about ontology identification being useful to solve ontology identification, or examples of how a solution would be helpful for alignment?
I’m asking for examples of specific problems in alignment where thinking of ontology identification is more helpful than just thinking about it the usual or obvious way.
I might not have exactly the kind of example you’re looking for, since I’d frame things a bit differently. So I’ll just try to say more about the question “why is it useful to explicitly think about ontology identification?”
One answer is that thinking explicitly about ontology identification can help you notice that there is a problem that you weren’t previously aware of. For example, I used to think that building extremely good models of human irrationality via cogsci for reward learning was probably not very tractable, but could at least lead to an outer alignment solution. I now think you’d also have to solve ontology identification, so I’m now very skeptical of that approach. As you point out in another comment, you could technically treat ontology identification as part of human irrationality (not sure if you’d call this the “usual/obvious way” in this setting?). But what you notice when separating out ontology identification is that if you have some way of solving the ontology identification part, you should probably just use that for ELK and skip the part where you model human irrationalities really well.
Another part of my answer is that ontology identification is not an obviously better frame for any single specific problem, but it can be used as a unifying frame to think about problems that would otherwise look quite different. So some examples of where ontology identification appears:
The ELK report setting: you want to give better informed preference comparisons
The case I mentioned above: you’ve done some cognitive science and are able to learn/write down human rewards in terms of the human ontology, but still need to translate them
You think that your semi-supervised model already has a good understanding of what human values/corrigibility/… are, and your plan is to retarget the search or to otherwise point an optimizer at this model’s understanding of human values. But you need to figure out where exactly in the AI human values are represented
To prevent your AI from becoming deceptive, you want to be able to tell whether it’s thinking certain types of thoughts (such as figuring out whether it could currently take over the world). This means you have to map AI thoughts into things we can understand
You want clear-cut criteria for deciding whether you’re interpreting some neuron correctly. This seems very similar to asking “How do we determine whether a given ontology translation is correct?” or “What does it even mean for an ontology translation to be ‘correct’?”
I think ontology identification is a very good framing for some of these even individually (e.g. getting better preference comparisons), and not so much for others (e.g. if you’re only thinking about avoiding deception, ontology identification might not be your first approach). But the interesting thing is that these problems seemed pretty different to me without the concept of ontology identification, but suddenly look closely related if we reframe them.
Great point, some rambly thoughts on this: one way in which ontology identification could turn out to be like no-free lunch theorems is that we actually just get the correct translation by default. I.e. in ELK report terminology, we train a reporter using the naive baseline and get the direct translator. This seems related to Alignment by default, and I think of them the same way (i.e. “This could happen but seems very scary to rely on that without better arguments for why it should happen). I’d say one reason we don’t think much about no-free lunch theorems as a key obstacle to AI is that we’ve seen tons of cases where good generalization happens because the world is low entropy. I don’t think we’ve seen that kind of evidence for ontology identification not being a problem in practice. That said, “I think ontology identification will be easy, here’s why” is another valid response to the question from this post.
A related point would be “Should we think about ontology identification explicitly, or just work on other stuff and eventually solve it implicitly?” My first instinct is to directly tackle ontology identification, but I could see cases where a solution to ontology identification is actually easier to find from another lens. I do think though that that other lens will have to tackle a similarly difficult and central problem; just working on approaches that essentially assume away the ontology identification problem will very likely not lead to progress on ontology identification.
For examples, do you mean examples of thinking about ontology identification being useful to solve ontology identification, or examples of how a solution would be helpful for alignment?
I’m asking for examples of specific problems in alignment where thinking of ontology identification is more helpful than just thinking about it the usual or obvious way.
I might not have exactly the kind of example you’re looking for, since I’d frame things a bit differently. So I’ll just try to say more about the question “why is it useful to explicitly think about ontology identification?”
One answer is that thinking explicitly about ontology identification can help you notice that there is a problem that you weren’t previously aware of. For example, I used to think that building extremely good models of human irrationality via cogsci for reward learning was probably not very tractable, but could at least lead to an outer alignment solution. I now think you’d also have to solve ontology identification, so I’m now very skeptical of that approach. As you point out in another comment, you could technically treat ontology identification as part of human irrationality (not sure if you’d call this the “usual/obvious way” in this setting?). But what you notice when separating out ontology identification is that if you have some way of solving the ontology identification part, you should probably just use that for ELK and skip the part where you model human irrationalities really well.
Another part of my answer is that ontology identification is not an obviously better frame for any single specific problem, but it can be used as a unifying frame to think about problems that would otherwise look quite different. So some examples of where ontology identification appears:
The ELK report setting: you want to give better informed preference comparisons
The case I mentioned above: you’ve done some cognitive science and are able to learn/write down human rewards in terms of the human ontology, but still need to translate them
You think that your semi-supervised model already has a good understanding of what human values/corrigibility/… are, and your plan is to retarget the search or to otherwise point an optimizer at this model’s understanding of human values. But you need to figure out where exactly in the AI human values are represented
To prevent your AI from becoming deceptive, you want to be able to tell whether it’s thinking certain types of thoughts (such as figuring out whether it could currently take over the world). This means you have to map AI thoughts into things we can understand
You want clear-cut criteria for deciding whether you’re interpreting some neuron correctly. This seems very similar to asking “How do we determine whether a given ontology translation is correct?” or “What does it even mean for an ontology translation to be ‘correct’?”
I think ontology identification is a very good framing for some of these even individually (e.g. getting better preference comparisons), and not so much for others (e.g. if you’re only thinking about avoiding deception, ontology identification might not be your first approach). But the interesting thing is that these problems seemed pretty different to me without the concept of ontology identification, but suddenly look closely related if we reframe them.
Makes sense, thanks for the reply!
For what it’s worth, I do think strong ELK is probably more tractable than the whole cog eco approach for preference learning.