The natural abstraction hypothesis can be split into three sub-claims, two empirical, one mathematical:
The third one:
Convergence: a wide variety of cognitive architectures learn and use approximately-the-same summaries.
Couldn’t this be operationalized as empirical if a wide variety...learn and give approximately the same predictions and recommendations for action (if you want this, do this), i.e. causal predictions?
Human-Compatibility: These summaries are the abstractions used by humans in day-to-day thought/language.
This seems contingent on ‘the human summaries are correct’ and ‘natural abstraction summaries are correct’, then claiming this happens, is just making a claim about a particular type of convergence. (Modulo the possibility that:
“human recommendations (may)/do not describe the system, and (may) instead focus on ‘what you should do’ which requires guesses about factors like ‘capabilities or resources’.)”
)
Along the way, it should be possible to prove theorems on what abstractions will be learned in at least some cases. Experiments should then [mostly] probe cases not handled by those theorems, enabling more general models and theorems, eventually leading to a unified theory.
I say ‘mostly’ because probing cases believed to be handled may reveal failure.
Then, the ultimate test of the natural abstraction hypothesis would just be a matter of pointing the abstraction-thermometer at the real world, and seeing if it spits out human-recognizable abstract objects/concepts.
Interesting this doesn’t involve ‘learners’ communicating, to see what sort of language they’ll develop. But this (described above) seems more straightforward.
It would imply that a wide range of architectures will reliably learn similar high-level concepts from the physical world, that those high-level concepts are exactly the objects/categories/concepts which humans care about (i.e. inputs to human values), and that we can precisely specify those concepts.
It seems good that the program described involves testing a variety, then seeing how they turn out (concerning object details, if not values), rather than attempting to design understandable architectures, if one wants to avoid the risk of a ‘ontological turn’ whereby ‘an AI’ develops a way of seeing the world that doesn’t line up after it ‘goes big’. (On the other hand, if understanding global systems requires learning concepts we haven’t learned yet, then, without learning those concepts, we might not be able to understand maps produced by (natural abstraction) learners without said learning. This property—something can’t be understood without certain knowledge or concepts—might be called ‘info-locked maps’ or ‘conceptual irreducibility’. Though it’s just a hypothesis for now.)
Couldn’t this be operationalized as empirical if a wide variety...learn and give approximately the same predictions and recommendations for action (if you want this, do this), i.e. causal predictions?
Very good question, and the answer is no. That may also be a true thing, but the hypothesis here is specifically about what structures the systems are using internally. In generally, things could give exactly the same externally-visible predictions/actions while using very different internal structures.
You are correct that this is a kind of convergence claim. It’s not claiming convergence in all intelligent systems, but I’m not sure exactly what the subset of intelligence systems is to which this claim applies. It has something to do with both limited computation and evolution (in a sense broad enough to include stochastic gradient descent).
The third one:
Couldn’t this be operationalized as empirical if a wide variety...learn and give approximately the same predictions and recommendations for action (if you want this, do this), i.e. causal predictions?
Human-Compatibility: These summaries are the abstractions used by humans in day-to-day thought/language.
This seems contingent on ‘the human summaries are correct’ and ‘natural abstraction summaries are correct’, then claiming this happens, is just making a claim about a particular type of convergence. (Modulo the possibility that:
“human recommendations (may)/do not describe the system, and (may) instead focus on ‘what you should do’ which requires guesses about factors like ‘capabilities or resources’.)”
)
I say ‘mostly’ because probing cases believed to be handled may reveal failure.
Interesting this doesn’t involve ‘learners’ communicating, to see what sort of language they’ll develop. But this (described above) seems more straightforward.
It seems good that the program described involves testing a variety, then seeing how they turn out (concerning object details, if not values), rather than attempting to design understandable architectures, if one wants to avoid the risk of a ‘ontological turn’ whereby ‘an AI’ develops a way of seeing the world that doesn’t line up after it ‘goes big’. (On the other hand, if understanding global systems requires learning concepts we haven’t learned yet, then, without learning those concepts, we might not be able to understand maps produced by (natural abstraction) learners without said learning. This property—something can’t be understood without certain knowledge or concepts—might be called ‘info-locked maps’ or ‘conceptual irreducibility’. Though it’s just a hypothesis for now.)
Very good question, and the answer is no. That may also be a true thing, but the hypothesis here is specifically about what structures the systems are using internally. In generally, things could give exactly the same externally-visible predictions/actions while using very different internal structures.
You are correct that this is a kind of convergence claim. It’s not claiming convergence in all intelligent systems, but I’m not sure exactly what the subset of intelligence systems is to which this claim applies. It has something to do with both limited computation and evolution (in a sense broad enough to include stochastic gradient descent).