I am still confused about these topics. We know that any behavior can be expressed as a complicated world-history utility function, and that therefore anything at all could be rational according to these. So I sometimes think of rationality as a spectrum, in which the simpler the utility function justifying your actions the more rational you are. According to such a definition rationality may actually be opposed to human values at the highest end, so it makes a lot of sense to focus on intelligence that is not fully rational.
Not really sure what you mean by a “honing epistemics” kind of rationality, but I understand that moral uncertainty in the perspective of the AGI may increase the chance that it keep some small fraction of the universe for us, so that would also be great. Is that what you mean? I don’t think it is going to be easy to have the AGI consider some phenomena as outside its scope (such that it would be irrational to meddle with it). If we want the AGI not to leave us alone, then this should be a value that we need to include in their utility function somehow.
Utility function evolution is something complicated. I worry a lot about that, particularly because this seems one of the ways to achieve corrigibility and we really want that, but it also looks as a violation of goal-integrity on the perspective of the AGI. Maybe it is possible for the AGI to consider this “module” responsible for giving feedback to itself as part of itself, just as we (usually) consider our midbrain and other evolutionary ancient “subcortical” areas as a part of us rather than some “other” system interfering with our higher goals.
That kind of conception of “rationality as simpletonness” is very unsual. I offer almost perfectly opposite view that an agent that cares about hunger is more primitive and less advanced being than one that cares about hunger and thirst. And the more sophistication there is to the being the more components its utility function seems to have.
with “honing epistemics” I am more trying get at the property of that makes a rationalist a rationalist. Being a homo economicus doesn’t make you be especially principled in your epistemics.
I agree my conception is unusual, I am ready to abandon it in favor of some better definition. At the same time I feel like an utility function having way too many components makes it useless as a concept.
Because here I’m trying to derive the utility from the actions, I feel like we can understand the being better the less information is required to encode its utility function, in a Kolmogorov complexity sense, and that if its too complex then there is no good explanation to the actions and we conclude the agent is acting somewhat randomly.
Maybe trying to derive the utility as a ‘compression’ of the actions is where the problem is, and I should distinguish more what the agent does from what the agent wants. An agent is then going to be irrational only if the wants are inconsistent with each other; if the actions are inconsistent with what it wants then it is merely incompetent, which is something else.
I am still confused about these topics. We know that any behavior can be expressed as a complicated world-history utility function, and that therefore anything at all could be rational according to these. So I sometimes think of rationality as a spectrum, in which the simpler the utility function justifying your actions the more rational you are. According to such a definition rationality may actually be opposed to human values at the highest end, so it makes a lot of sense to focus on intelligence that is not fully rational.
Not really sure what you mean by a “honing epistemics” kind of rationality, but I understand that moral uncertainty in the perspective of the AGI may increase the chance that it keep some small fraction of the universe for us, so that would also be great. Is that what you mean? I don’t think it is going to be easy to have the AGI consider some phenomena as outside its scope (such that it would be irrational to meddle with it). If we want the AGI not to leave us alone, then this should be a value that we need to include in their utility function somehow.
Utility function evolution is something complicated. I worry a lot about that, particularly because this seems one of the ways to achieve corrigibility and we really want that, but it also looks as a violation of goal-integrity on the perspective of the AGI. Maybe it is possible for the AGI to consider this “module” responsible for giving feedback to itself as part of itself, just as we (usually) consider our midbrain and other evolutionary ancient “subcortical” areas as a part of us rather than some “other” system interfering with our higher goals.
That kind of conception of “rationality as simpletonness” is very unsual. I offer almost perfectly opposite view that an agent that cares about hunger is more primitive and less advanced being than one that cares about hunger and thirst. And the more sophistication there is to the being the more components its utility function seems to have.
with “honing epistemics” I am more trying get at the property of that makes a rationalist a rationalist. Being a homo economicus doesn’t make you be especially principled in your epistemics.
I agree my conception is unusual, I am ready to abandon it in favor of some better definition. At the same time I feel like an utility function having way too many components makes it useless as a concept.
Because here I’m trying to derive the utility from the actions, I feel like we can understand the being better the less information is required to encode its utility function, in a Kolmogorov complexity sense, and that if its too complex then there is no good explanation to the actions and we conclude the agent is acting somewhat randomly.
Maybe trying to derive the utility as a ‘compression’ of the actions is where the problem is, and I should distinguish more what the agent does from what the agent wants. An agent is then going to be irrational only if the wants are inconsistent with each other; if the actions are inconsistent with what it wants then it is merely incompetent, which is something else.