There are two questions that currently guide my intuitions on how interesting a (speculative) model of agentic behavior might be.
1. How well does this model map onto real agentic behavior?
2. Is the described behavior “naturally” emergent from environmental conditions?
In a recent post I argued, in slightly different words, that seeing agents as confused, uncertain and illogical about their own preferences is fruitful because it answers the second question in a satisfactory way. Internal inconsistency is not only a commonly observable behavior (c.f. behavioral economics); it is additionally an emergent strategy to protect against agents finding adversarial internal representations of external goals. I called this internal reward-hacking.
What I think I failed to communicate is that my final thoughts on preferences competing with each other also come from reflecting on question 2. It makes sense that agents might weigh different memes representing possible preferences for improved decision-making, but this doesn’t answer how the memes behave with respect to that agent.
Memes are subject to selection pressures not unlike animals. Moreover, memes can co-adapt or compete with each other, and they can arguably engage in active transmission such as influencing their host’s behavior to enable their spread. It therefore feels reasonable for models of human cognition to imbue memes with some agency, enabling them to perceive and interact each other and their hosts (Claude’s literature review indicates that at least some memeticists disagree with me on this point).
A model that aspires to respect these agentic properties of memes would thus be incomplete without describing what incentives memes have to participate in a cognitive system, and how those incentives shape the system’s emergent properties. That’s why I think it’s insufficient to see potential preferences as impartial, passive sub-modules that agents deploy to enable their own rationality. Instead, we should always attempt to answer “what’s in it for the memes?”
The nature of doing interdisciplinary research is that you have to know a little about a lot of things. Unfortunately, it’s hard to tell whether you know a little about something or are just misinformed about it.
Much of my agent-ey thinking is inspired by and seeks to adequately model human cognition, but I realised I have no solid understanding of the relationship consciousness has to cognition. They’re definitely not the same, since most processes I could describe as cognitive don’t materialise in my consciousness. However, almost all the examples I find insightful feature conscious decision-making. This suggests that what cognition is without consciouness is opaque to me.
more conscious: deciding what move to make in a chess game.
less conscious: The physical act of playing a move. You can move the piece in a conscious, deliberate way, but in practice the movement usually follows “automatically” from the high-level decision of what move to play.
There are two questions that currently guide my intuitions on how interesting a (speculative) model of agentic behavior might be.
1. How well does this model map onto real agentic behavior?
2. Is the described behavior “naturally” emergent from environmental conditions?
In a recent post I argued, in slightly different words, that seeing agents as confused, uncertain and illogical about their own preferences is fruitful because it answers the second question in a satisfactory way. Internal inconsistency is not only a commonly observable behavior (c.f. behavioral economics); it is additionally an emergent strategy to protect against agents finding adversarial internal representations of external goals. I called this internal reward-hacking.
What I think I failed to communicate is that my final thoughts on preferences competing with each other also come from reflecting on question 2. It makes sense that agents might weigh different memes representing possible preferences for improved decision-making, but this doesn’t answer how the memes behave with respect to that agent.
Memes are subject to selection pressures not unlike animals. Moreover, memes can co-adapt or compete with each other, and they can arguably engage in active transmission such as influencing their host’s behavior to enable their spread. It therefore feels reasonable for models of human cognition to imbue memes with some agency, enabling them to perceive and interact each other and their hosts (Claude’s literature review indicates that at least some memeticists disagree with me on this point).
A model that aspires to respect these agentic properties of memes would thus be incomplete without describing what incentives memes have to participate in a cognitive system, and how those incentives shape the system’s emergent properties. That’s why I think it’s insufficient to see potential preferences as impartial, passive sub-modules that agents deploy to enable their own rationality. Instead, we should always attempt to answer “what’s in it for the memes?”
The nature of doing interdisciplinary research is that you have to know a little about a lot of things. Unfortunately, it’s hard to tell whether you know a little about something or are just misinformed about it.
Much of my agent-ey thinking is inspired by and seeks to adequately model human cognition, but I realised I have no solid understanding of the relationship consciousness has to cognition. They’re definitely not the same, since most processes I could describe as cognitive don’t materialise in my consciousness. However, almost all the examples I find insightful feature conscious decision-making. This suggests that what cognition is without consciouness is opaque to me.
Could you give some examples of what you consider to be conscious and unconscious cognitive processes?
more conscious: deciding what move to make in a chess game.
less conscious: The physical act of playing a move. You can move the piece in a conscious, deliberate way, but in practice the movement usually follows “automatically” from the high-level decision of what move to play.
not conscious: reflex reactions.