It seems like one common and useful type of abstraction is aggregating together distinct things with similar effects. Some examples:
“Heat” can be seen as the aggregation of molecular motion in all sorts of directions; because of chaos, the different directions and different molecules don’t really matter, and therefore we can usefully just add up all their kinetic energies into a variable called “heat”.
A species like “humans” can be seen as the aggregation (though disjunction rather than sum) of many distinct genetic patterns. However, ultimately the genetic patterns are simple enough that they all code for basically the same thing.
A person like me can be seen as the aggregation of my state through my entire life trajectory. (Again unlike heat, this would be disjunction rather than sum.) A major part of why the abstraction of “tailcalled” makes sense is that I am causally somewhat consistent across my life trajectory.
An abstraction that aggregates distinct things with similar effects seems like it has a reasonably good chance to be un-path-dependent. However, it’s not quite guaranteed, which you can see by e.g. the third example. While I will have broadly similar effects through my life trajectory, the effects I will have will change over time, and the way they change may depend on what happens to me. For instance if my brain got destructively scanned and uploaded while my body was left behind, then my effects would be “split”, with my psychology continuing into the upload while my appearance stayed with my dead body (until it decayed).
“Heat” can be seen as the aggregation of molecular motion in all sorts of directions; because of chaos, the different directions and different molecules don’t really matter, and therefore we can usefully just add up all their kinetic energies into a variable called “heat”.
Nitpick: this is not strictly correct. This would be the internal energy of a thermodynamic system, but “heat” in thermodynamics refers to energy that’s exchanged between systems, not energy that’s in a system.
Aside from the nitpick, however, point taken.
An abstraction that aggregates distinct things with similar effects seems like it has a reasonably good chance to be un-path-dependent. However, it’s not quite guaranteed, which you can see by e.g. the third example. While I will have broadly similar effects through my life trajectory, the effects I will have will change over time, and the way they change may depend on what happens to me. For instance if my brain got destructively scanned and uploaded while my body was left behind, then my effects would be “split”, with my psychology continuing into the upload while my appearance stayed with my dead body (until it decayed).
I think there is a general problem with these path-dependent concepts in that the ideal version of the concept might be path-dependent, but in practice we can only work within the physical state to keep track of what the path used to be. It’s analogous to how an idealized version of personal identity might require a continuous stream of gradually changing agents and so on, but in practice all we have to go on is what memories people have about how things used to be.
For example, in Lockean property rights theory, “who is the rightful owner of a house” is a path-dependent question. You need to trace the entire history of the house in order to figure out who should own it right now. However, in practice we have to implement property rights by storing some information about the ownership of the house in the current physical state.
If you then train an AI to understand the ownership relation and it learns the relation that we have actually implemented rather than the idealized version we have in mind, it can think that what we really care about is who is “recorded” as the owner of a house in the current physical state rather than who is “legitimately” the owner of the house, and in the extreme cases that can lead it to take some bizarre actions when you ask it to optimize something that has to do with the concept of property rights.
In the end, I think it comes down to which way of doing it takes up less complexity or less bits of information in whatever representation the AI is using to encode these relations. If path-dependent concepts are naturally more complicated for the AI to wrap its head around, SGD can find something that’s path-independent and that fits the training data perfectly, and then you could be in trouble. This is a general story with alignment failure but if we decide we really care about path-dependence then it’s also a concept we’ll want to get the AI to care about somehow.
For example, in Lockean property rights theory, “who is the rightful owner of a house” is a path-dependent question.
Ah, I think this is a fundamentally different kind of abstraction than the “aggregating together distinct things with similar effects” type of abstraction I mentioned. To distinguish, I suggest we use the name “causal abstraction” for the kind I mentioned, and the name “protocol abstraction” (or something else) for this concept. So:
Causal abstraction: aggregating together distinct phenomena that have similar causal relations into a lumpy concept that can be modelled as having the same causal relations as its constituents
Protocol abstraction: extending your ontology with new “epiphenomenal” variables that follow certain made-up rules (primarily for the use in social coordination, so that there is a ground truth even with deception? - but can also be used on an individual level, in values)
It’s analogous to how an idealized version of personal identity might require a continuous stream of gradually changing agents and so on, but in practice all we have to go on is what memories people have about how things used to be.
I feel like personal identity has both elements of causal abstraction and of protocol abstraction. E.g. social relationships like debts seem to be strongly tied to protocol abstraction, but there’s also lots of social behavior that only relies on causal abstraction.
If you then train an AI to understand the ownership relation and it learns the relation that we have actually implemented rather than the idealized version we have in mind, it can think that what we really care about is who is “recorded” as the owner of a house in the current physical state rather than who is “legitimately” the owner of the house, and in the extreme cases that can lead it to take some bizarre actions when you ask it to optimize something that has to do with the concept of property rights.
I agree.
Coming up with a normative theory of agency in the case of protocol abstraction actually sounds like a fairly important task. I have some ideas about how to address causal abstraction, but I haven’t really thought much about protocol abstraction before.
I think your distinction between causal and protocol abstractions makes sense and it’s related to my distinction between causally relevant vs causally irrelevant latent variables. It’s not quite the same, because abstractions which are rendered causally irrelevant in some world model can still be causal in the sense of aggregating together a bunch of things with similar causal properties.
I feel like personal identity has both elements of causal abstraction and of protocol abstraction. E.g. social relationships like debts seem to be strongly tied to protocol abstraction, but there’s also lots of social behavior that only relies on causal abstraction.
I agree.
Coming up with a normative theory of agency in the case of protocol abstraction actually sounds like a fairly important task. I have some ideas about how to address causal abstraction, but I haven’t really thought much about protocol abstraction before.
Can you clarify what you mean by a “normative theory of agency”? I don’t think I’ve ever seen this phrase before.
Can you clarify what you mean by a “normative theory of agency”? I don’t think I’ve ever seen this phrase before.
What I mean is stuff like decision theory/selection theorems/rationality; studies of what kinds of ways agents normatively should act.
Usually such theories do not take abstractions into account. I have some ideas for how to take causal abstractions into account, but I don’t think I’ve seen protocol abstractions investigated much.
In a sense, they could technically be handled by just having utility functions over universe trajectories rather than universe states, but there are some things about this that seem unnatural (e.g. for the purpose of Alex Turner’s power-seeking theorems, utility functions over trajectories may be extraordinarily power-seeking, and so if we could find a narrower class of utility functions, that would be useful).
Here’s a partial answer to the question:
It seems like one common and useful type of abstraction is aggregating together distinct things with similar effects. Some examples:
“Heat” can be seen as the aggregation of molecular motion in all sorts of directions; because of chaos, the different directions and different molecules don’t really matter, and therefore we can usefully just add up all their kinetic energies into a variable called “heat”.
A species like “humans” can be seen as the aggregation (though disjunction rather than sum) of many distinct genetic patterns. However, ultimately the genetic patterns are simple enough that they all code for basically the same thing.
A person like me can be seen as the aggregation of my state through my entire life trajectory. (Again unlike heat, this would be disjunction rather than sum.) A major part of why the abstraction of “tailcalled” makes sense is that I am causally somewhat consistent across my life trajectory.
An abstraction that aggregates distinct things with similar effects seems like it has a reasonably good chance to be un-path-dependent. However, it’s not quite guaranteed, which you can see by e.g. the third example. While I will have broadly similar effects through my life trajectory, the effects I will have will change over time, and the way they change may depend on what happens to me. For instance if my brain got destructively scanned and uploaded while my body was left behind, then my effects would be “split”, with my psychology continuing into the upload while my appearance stayed with my dead body (until it decayed).
Nitpick: this is not strictly correct. This would be the internal energy of a thermodynamic system, but “heat” in thermodynamics refers to energy that’s exchanged between systems, not energy that’s in a system.
Aside from the nitpick, however, point taken.
I think there is a general problem with these path-dependent concepts in that the ideal version of the concept might be path-dependent, but in practice we can only work within the physical state to keep track of what the path used to be. It’s analogous to how an idealized version of personal identity might require a continuous stream of gradually changing agents and so on, but in practice all we have to go on is what memories people have about how things used to be.
For example, in Lockean property rights theory, “who is the rightful owner of a house” is a path-dependent question. You need to trace the entire history of the house in order to figure out who should own it right now. However, in practice we have to implement property rights by storing some information about the ownership of the house in the current physical state.
If you then train an AI to understand the ownership relation and it learns the relation that we have actually implemented rather than the idealized version we have in mind, it can think that what we really care about is who is “recorded” as the owner of a house in the current physical state rather than who is “legitimately” the owner of the house, and in the extreme cases that can lead it to take some bizarre actions when you ask it to optimize something that has to do with the concept of property rights.
In the end, I think it comes down to which way of doing it takes up less complexity or less bits of information in whatever representation the AI is using to encode these relations. If path-dependent concepts are naturally more complicated for the AI to wrap its head around, SGD can find something that’s path-independent and that fits the training data perfectly, and then you could be in trouble. This is a general story with alignment failure but if we decide we really care about path-dependence then it’s also a concept we’ll want to get the AI to care about somehow.
Ah, I think this is a fundamentally different kind of abstraction than the “aggregating together distinct things with similar effects” type of abstraction I mentioned. To distinguish, I suggest we use the name “causal abstraction” for the kind I mentioned, and the name “protocol abstraction” (or something else) for this concept. So:
Causal abstraction: aggregating together distinct phenomena that have similar causal relations into a lumpy concept that can be modelled as having the same causal relations as its constituents
Protocol abstraction: extending your ontology with new “epiphenomenal” variables that follow certain made-up rules (primarily for the use in social coordination, so that there is a ground truth even with deception? - but can also be used on an individual level, in values)
I feel like personal identity has both elements of causal abstraction and of protocol abstraction. E.g. social relationships like debts seem to be strongly tied to protocol abstraction, but there’s also lots of social behavior that only relies on causal abstraction.
I agree.
Coming up with a normative theory of agency in the case of protocol abstraction actually sounds like a fairly important task. I have some ideas about how to address causal abstraction, but I haven’t really thought much about protocol abstraction before.
I think your distinction between causal and protocol abstractions makes sense and it’s related to my distinction between causally relevant vs causally irrelevant latent variables. It’s not quite the same, because abstractions which are rendered causally irrelevant in some world model can still be causal in the sense of aggregating together a bunch of things with similar causal properties.
I agree.
Can you clarify what you mean by a “normative theory of agency”? I don’t think I’ve ever seen this phrase before.
What I mean is stuff like decision theory/selection theorems/rationality; studies of what kinds of ways agents normatively should act.
Usually such theories do not take abstractions into account. I have some ideas for how to take causal abstractions into account, but I don’t think I’ve seen protocol abstractions investigated much.
In a sense, they could technically be handled by just having utility functions over universe trajectories rather than universe states, but there are some things about this that seem unnatural (e.g. for the purpose of Alex Turner’s power-seeking theorems, utility functions over trajectories may be extraordinarily power-seeking, and so if we could find a narrower class of utility functions, that would be useful).