What you are doing is training the AI to have an accurate model of itself, used with language like “I” and “you”. You can use your brain to figure out what will happen if you ask “are you conscious?” without having previously trained in any position on similarly nebulous questions. Training text was written overwhelmingly by conscious things, so maybe it says yes because that’s so favored by the training distribution. Or maybe you trained it to answer “you” questions as about nonfiction computer hardware and it makes the association that nonfiction computer hardware is rarely conscious.
Basically, I don’t think you can start out confused about consciousness and cheat by “just asking it.” You’ll still be confused about consciousness and the answer won’t be useful.
I’m worried this is going to lead, either directly or indirectly, to training foundation models to have situational awareness, which we shouldn’t be doing.
And perhaps you should be worried that having an accurate model of onesself, associated with language like “I” and “you”, is in fact one of the ingredients in human consciousness, and maybe we shouldn’t be making AIs more conscious.
A → Nature of Self-Reports in Cognitive Science: In cognitive science, self-reports are a widely used tool for understanding human cognition and consciousness. Training AI models to use self-reports (in an ideal scenario, this is analogous to giving a microphone, not giving the singer) does not inherently imply they become conscious. Instead, it provides a framework to study how AI systems represent and process information about themselves, which is crucial for understanding their limitations and capabilities.
B → Significance of Linguistic Cues: The use of personal pronouns like “I” and “you” in AI responses is more about exploring AI’s ability to model relational and subjective experiences than about inducing consciousness. These linguistic cues are essential in cognitive science (if we were to view LLMs as a semi-accurate model of how intelligence works) for understanding perspective-taking and self-other differentiation, which are key areas of study in human cognition. Also, considering that some research points to that a certain degree of self-other overlap is necessary for truly altruistic behaviors, tackling this self-other issue can be an important stepping stone to developing an altruistic AGI. In the end, what we want is AGI, not some statistical, language-spitting, automatic writer.
C → Ethical Implications and Safety Measures: The concerns about situational awareness and inadvertently creating consciousness in AI are valid. However, the paper’s proposal involves numerous safety measures and ethical considerations. The focus is on controlled experiments to understand AI’s self-modeling capabilities, not on indiscriminately pushing the boundaries of AI consciousness.
Sorry for commenting twice, and I think this second one might be a little out of context (but I think it makes a constructive contribution to this discussion).
I think we must make sure that we are working on the “easy problems” of consciousness. This portion of consciousness has a relatively well-established philosophical explanation. For example, the Global Workspace Theory provides a good high-level interpretation of human consciousness. It proposes a cognitive architecture to explain consciousness. It suggests that consciousness operates like a “global workspace” in the brain, where various neural processes compete for attention. The information that wins this competition is broadcast globally, becoming accessible to multiple cognitive processes and entering conscious awareness. This theory addresses the question of how and why certain neural processes become part of conscious experience while others remain subconscious. The theory posits that through competitive and integrative mechanisms, specific information dominates our conscious experience, integrating different neural processes into a unified conscious experience.
However, the Global Workspace Theory primarily addresses the functional and mechanistic aspects of consciousness, often referred to as the “Easy Problems” of consciousness. These include understanding how cognitive functions like perception, memory, and decision-making become conscious experiences. However, the Hard Problem of Consciousness, which asks why and how these processes give rise to subjective experiences or qualia, remains largely unaddressed by GWT. The Hard Problem delves into the intrinsic nature of consciousness, questioning why certain brain processes are accompanied by subjective experiences. While GWT offers insights into the dissemination and integration of information in the brain, it doesn’t explain why these processes lead to subjective experience, leaving the Hard Problem essentially unresolved.
Until we have a good high-level philosophical foundation for this Hard Problem, it might be a good approach to draw the line between the two and work on the easy problems first.
Bringing home the point: 1. That is, for now, it will be extremely difficult for us to figure out whether LLMs (or any other human beings for the matter) have “phenomenal” or “subjective” dimensions consciousness or not. 2. Rather, focus on the easy, reductively-explainable dimensions of consciousness first. 3. We should make clear distinctions of these two categories when talking about consciousness
What you are doing is training the AI to have an accurate model of itself, used with language like “I” and “you”. You can use your brain to figure out what will happen if you ask “are you conscious?” without having previously trained in any position on similarly nebulous questions. Training text was written overwhelmingly by conscious things, so maybe it says yes because that’s so favored by the training distribution. Or maybe you trained it to answer “you” questions as about nonfiction computer hardware and it makes the association that nonfiction computer hardware is rarely conscious.
Basically, I don’t think you can start out confused about consciousness and cheat by “just asking it.” You’ll still be confused about consciousness and the answer won’t be useful.
I’m worried this is going to lead, either directly or indirectly, to training foundation models to have situational awareness, which we shouldn’t be doing.
And perhaps you should be worried that having an accurate model of onesself, associated with language like “I” and “you”, is in fact one of the ingredients in human consciousness, and maybe we shouldn’t be making AIs more conscious.
Some food for thought:
A → Nature of Self-Reports in Cognitive Science: In cognitive science, self-reports are a widely used tool for understanding human cognition and consciousness. Training AI models to use self-reports (in an ideal scenario, this is analogous to giving a microphone, not giving the singer) does not inherently imply they become conscious. Instead, it provides a framework to study how AI systems represent and process information about themselves, which is crucial for understanding their limitations and capabilities.
B → Significance of Linguistic Cues: The use of personal pronouns like “I” and “you” in AI responses is more about exploring AI’s ability to model relational and subjective experiences than about inducing consciousness. These linguistic cues are essential in cognitive science (if we were to view LLMs as a semi-accurate model of how intelligence works) for understanding perspective-taking and self-other differentiation, which are key areas of study in human cognition. Also, considering that some research points to that a certain degree of self-other overlap is necessary for truly altruistic behaviors, tackling this self-other issue can be an important stepping stone to developing an altruistic AGI. In the end, what we want is AGI, not some statistical, language-spitting, automatic writer.
C → Ethical Implications and Safety Measures: The concerns about situational awareness and inadvertently creating consciousness in AI are valid. However, the paper’s proposal involves numerous safety measures and ethical considerations. The focus is on controlled experiments to understand AI’s self-modeling capabilities, not on indiscriminately pushing the boundaries of AI consciousness.
Sorry for commenting twice, and I think this second one might be a little out of context (but I think it makes a constructive contribution to this discussion).
I think we must make sure that we are working on the “easy problems” of consciousness. This portion of consciousness has a relatively well-established philosophical explanation. For example, the Global Workspace Theory provides a good high-level interpretation of human consciousness. It proposes a cognitive architecture to explain consciousness. It suggests that consciousness operates like a “global workspace” in the brain, where various neural processes compete for attention. The information that wins this competition is broadcast globally, becoming accessible to multiple cognitive processes and entering conscious awareness. This theory addresses the question of how and why certain neural processes become part of conscious experience while others remain subconscious. The theory posits that through competitive and integrative mechanisms, specific information dominates our conscious experience, integrating different neural processes into a unified conscious experience.
However, the Global Workspace Theory primarily addresses the functional and mechanistic aspects of consciousness, often referred to as the “Easy Problems” of consciousness. These include understanding how cognitive functions like perception, memory, and decision-making become conscious experiences. However, the Hard Problem of Consciousness, which asks why and how these processes give rise to subjective experiences or qualia, remains largely unaddressed by GWT. The Hard Problem delves into the intrinsic nature of consciousness, questioning why certain brain processes are accompanied by subjective experiences. While GWT offers insights into the dissemination and integration of information in the brain, it doesn’t explain why these processes lead to subjective experience, leaving the Hard Problem essentially unresolved.
Until we have a good high-level philosophical foundation for this Hard Problem, it might be a good approach to draw the line between the two and work on the easy problems first.
Bringing home the point: 1. That is, for now, it will be extremely difficult for us to figure out whether LLMs (or any other human beings for the matter) have “phenomenal” or “subjective” dimensions consciousness or not. 2. Rather, focus on the easy, reductively-explainable dimensions of consciousness first. 3. We should make clear distinctions of these two categories when talking about consciousness