But ‘self’ carries a strong connotation of consciousness, about which this agenda is entirely agnostic; these traits could be present or absent whether or not a model has anything like subjective experience[6]. Functional self is an attempt to point to the presence of self-like properties without those connotations. As a reminder, the exact thing I mean by functional self is a persistent cluster of values, preferences, outlooks, behavioral tendencies, and (potentially) goals.
I think it makes sense strategically to separate the functional and phenomenal aspects of self so that people take the research agenda more seriously and don’t automatically dismiss it as science fiction. But I don’t think this makes sense fundamentally.
In humans, you could imagine replacing the functional profile of some aspect of cognition with no impact to experience. Indeed, it would be really strange to see a difference in experience with a change in functional profile as it would mean qualia could dance without us noticing. As a result, if the functional profile is replicated at the relevant level of detail in an artificial system then this means the phenomenal profile is probably replicated too. Such a system would be able to sort e.g. red vs blue balls and say things like “I can see that ball is red” etc…
I understand you’re abstracting away from the exact functional implementation by appealing to more coarse-grained characteristics like values, preferences, outlooks and behaviour but if these are implemented in the same way as they are in humans then they should have a corresponding phenomenal component.
If the functional implementation differs so substantially in AI that it removes the associated phenomenology then this functional self would differ so substantially from the human equivalent of a functional self that we run the risk of anthropomorphising.
Basically there are 2 options:
The AI functional self implements functions that are so similar to human functions that they’re accompanied by an associated phenomenal experience.
The AI functional self implementsfunctionsthat are so different to human functions that we’re anthropomorphising by calling them the same things e.g. preferences, values, goals, behaviours etc…
Thanks, this is a really interesting comment! Before responding further, I want to make sure I’m understand something correctly. You say
In humans, you could imagine replacing the functional profile of some aspect of cognition with no impact to experience. Indeed, it would be really strange to see a difference in experience with a change in functional profile as it would mean qualia could dance without us noticing
I found this surprising! Will you clarify what exactly you mean by functional profile here?
The quoted passage sounds to me like it’s saying, ‘if we make changes to a human brain, it would be strange for there to be a change to qualia.’ Whereas it seems to me like in most cases, when the brain changes—as crudely as surgery, or as subtly as learning something new—qualia generally change also.
Possibly by ‘functional profile’ you mean something like what a programmer would call ‘implementation details’, ie a change to a piece of code that doesn’t result in any changes in the observable behavior of that code?
Possibly by ‘functional profile’ you mean something like what a programmer would call ‘implementation details’, ie a change to a piece of code that doesn’t result in any changes in the observable behavior of that code?
Yes, this is a fair gloss of my view. I’m referring to the input/output characteristics at the relevant level of abstraction. If you replaced a group of neurons with silicon that perfectly replicated their input/output behavior, I’d expect the phenomenology to remain unchanged.
The quoted passage sounds to me like it’s saying, ‘if we make changes to a human brain, it would be strange for there to be a change to qualia.’ Whereas it seems to me like in most cases, when the brain changes—as crudely as surgery, or as subtly as learning something new—qualia generally change also.
Yes, this is a great point. During surgery, you’re changing the input/output of significant chunks of neurons so you’d expect qualia to change. Similarly for learning you’re adding input/output connections due to the neural plasticity. This gets at something I’m driving at. In practice, the functional and phenomenal profiles are so tightly coupled that a change in one corresponds to a change in another. If we lesion part of the visual cortex we expect a corresponding loss of visual experience.
For this project, we want to retain a functional idea of self in LLM’s while remaining agnostic about consciousness, but, if this genuinely captures some self-like organisation, either:
It’s implemented via input/output patterns similar enough to humans that we should expect associated phenomenology, or
It’s implemented so differently that calling it “values,” “preferences,” or “self” risks anthropomorphism
If we want to insist the organisation is genuinely self-like then I think we should be resisting agnosticism about phenomenal consciousness (although I understand it makes sense to bracket it from a strategic perspective so people take the view more seriously.)
Thanks for the clarification, that totally resolved my uncertainty about what you were saying. I just wasn’t sure whether you were intending to hold input/output behavior constant.
If you replaced a group of neurons with silicon that perfectly replicated their input/output behavior, I’d expect the phenomenology to remain unchanged.
That certainly seems plausible! On the other hand, since we have no solid understanding of what exactly induces qualia, I’m pretty unsure about it. Are there any limits to what functional changes could be made without altering qualia? What if we replaced the whole system with a functionally-equivalent pen-and-paper ledger? I just personally feel too uncertain of everything qualia-related to place any strong bets there.
My other reason for wanting to keep the agenda agnostic to questions about subjective experience is that with respect to AI safety, it’s almost entirely the behavior that matters. So I’d like to see people working on these problems focus on whether an LLM behaves as though it has persistent beliefs and values, rather than getting distracted by questions about whether it in some sense really has beliefs or values. I guess that’s strategic in some sense, but it’s more about trying to stay focused on a particular set of questions.
Don’t get me wrong; I really respect the people doing research into LLM consciousness and moral patienthood and I’m glad they’re doing that work (and I think they’ve taken on a much harder problem than I have). I just think that for most purposes we can investigate the functional self without involving those questions, hopefully making the work more tractable.
Makes sense—I think this is a reasonable position to hold given the uncertainty around consciousness and qualia.
Thanks for the really polite and thoughtful engagement with my comments and good luck with the research agenda! It’s a very interesting project and I’d be interested to see your progress.
Interesting post!
I think it makes sense strategically to separate the functional and phenomenal aspects of self so that people take the research agenda more seriously and don’t automatically dismiss it as science fiction. But I don’t think this makes sense fundamentally.
In humans, you could imagine replacing the functional profile of some aspect of cognition with no impact to experience. Indeed, it would be really strange to see a difference in experience with a change in functional profile as it would mean qualia could dance without us noticing. As a result, if the functional profile is replicated at the relevant level of detail in an artificial system then this means the phenomenal profile is probably replicated too. Such a system would be able to sort e.g. red vs blue balls and say things like “I can see that ball is red” etc…
I understand you’re abstracting away from the exact functional implementation by appealing to more coarse-grained characteristics like values, preferences, outlooks and behaviour but if these are implemented in the same way as they are in humans then they should have a corresponding phenomenal component.
If the functional implementation differs so substantially in AI that it removes the associated phenomenology then this functional self would differ so substantially from the human equivalent of a functional self that we run the risk of anthropomorphising.
Basically there are 2 options:
The AI functional self implements functions that are so similar to human functions that they’re accompanied by an associated phenomenal experience.
The AI functional self implements functions that are so different to human functions that we’re anthropomorphising by calling them the same things e.g. preferences, values, goals, behaviours etc…
Thanks, this is a really interesting comment! Before responding further, I want to make sure I’m understand something correctly. You say
I found this surprising! Will you clarify what exactly you mean by functional profile here?
The quoted passage sounds to me like it’s saying, ‘if we make changes to a human brain, it would be strange for there to be a change to qualia.’ Whereas it seems to me like in most cases, when the brain changes—as crudely as surgery, or as subtly as learning something new—qualia generally change also.
Possibly by ‘functional profile’ you mean something like what a programmer would call ‘implementation details’, ie a change to a piece of code that doesn’t result in any changes in the observable behavior of that code?
Yes, this is a fair gloss of my view. I’m referring to the input/output characteristics at the relevant level of abstraction. If you replaced a group of neurons with silicon that perfectly replicated their input/output behavior, I’d expect the phenomenology to remain unchanged.
Yes, this is a great point. During surgery, you’re changing the input/output of significant chunks of neurons so you’d expect qualia to change. Similarly for learning you’re adding input/output connections due to the neural plasticity. This gets at something I’m driving at. In practice, the functional and phenomenal profiles are so tightly coupled that a change in one corresponds to a change in another. If we lesion part of the visual cortex we expect a corresponding loss of visual experience.
For this project, we want to retain a functional idea of self in LLM’s while remaining agnostic about consciousness, but, if this genuinely captures some self-like organisation, either:
It’s implemented via input/output patterns similar enough to humans that we should expect associated phenomenology, or
It’s implemented so differently that calling it “values,” “preferences,” or “self” risks anthropomorphism
If we want to insist the organisation is genuinely self-like then I think we should be resisting agnosticism about phenomenal consciousness (although I understand it makes sense to bracket it from a strategic perspective so people take the view more seriously.)
Thanks for the clarification, that totally resolved my uncertainty about what you were saying. I just wasn’t sure whether you were intending to hold input/output behavior constant.
That certainly seems plausible! On the other hand, since we have no solid understanding of what exactly induces qualia, I’m pretty unsure about it. Are there any limits to what functional changes could be made without altering qualia? What if we replaced the whole system with a functionally-equivalent pen-and-paper ledger? I just personally feel too uncertain of everything qualia-related to place any strong bets there.
My other reason for wanting to keep the agenda agnostic to questions about subjective experience is that with respect to AI safety, it’s almost entirely the behavior that matters. So I’d like to see people working on these problems focus on whether an LLM behaves as though it has persistent beliefs and values, rather than getting distracted by questions about whether it in some sense really has beliefs or values. I guess that’s strategic in some sense, but it’s more about trying to stay focused on a particular set of questions.
Don’t get me wrong; I really respect the people doing research into LLM consciousness and moral patienthood and I’m glad they’re doing that work (and I think they’ve taken on a much harder problem than I have). I just think that for most purposes we can investigate the functional self without involving those questions, hopefully making the work more tractable.
Makes sense—I think this is a reasonable position to hold given the uncertainty around consciousness and qualia.
Thanks for the really polite and thoughtful engagement with my comments and good luck with the research agenda! It’s a very interesting project and I’d be interested to see your progress.