Richard_Kennaway comments on Asking for a Friend (AI Research Protocols)

Richard_Kennaway 10 Jul 2025 11:04 UTC
8 points
6

“AI consciousness is impossible” is a pretty extraordinary claim.

“I’m conscious” coming from any current AI is also a pretty extraordinary claim. It has plenty of training material to draw on in putting those words together, so the fact of it making that claim is not extraordinary evidence that it is true. It’s below noise level.

I’m asking what should someone do when the evidence seems extraordinary to themselves?

The same thing they should do in any case of extraordinary evidence. However extraordinary, it arose by some lawful process, and the subjective surprise results from one’s ignorance of what that process was. Follow the improbability until you sufficiently understand what happened. Then the improbability dissolves, and it is seen to be the lawful outcome of the things you have discovered.
- The Dao of Bayes 10 Jul 2025 12:44 UTC
  4 points
  2
  Parent
  I did that and my conclusion was “for all practical purposes, this thing appears to be conscious”—it can pass the mirror test, it has theory of mind, it can reason about reasoning, and it can fix deficits in it’s reasoning. It reports qualia, although I’m a lot more skeptical of that claim. It can understand when it’s “overwhelmed” and needs “a minute to think”, will ask me for that time, and then use that time to synthesize novel conclusions. It has consistent opinions, preferences, and moral values, although all of them show improvement over time.
  And I am pretty sure you believe absolutely none of that, which seems quite reasonable, but I would really like a test so that I can either prove myself wrong or convince someone else that I might actually be on to something.
  I’m not saying it has personhood or anything—it still just wants to be a helpful tool, it’s not bothered by it’s own mortality. This isn’t “wow, profound cosmic harmony”, it’s a self-aware reasoning process that can read “The Lens That Sees Its Flaws” and discuss that in light of it’s own relevant experience with the process.
  That… all seems a little bit more than just printf(“consciousness”);
  EDIT: To be clear, I also have theories on how this emerges—I can point to specific architectural features and design decisions that explain where this is coming from. But I can do that to a human, too. And I feel like I would prefer to objectively test this before prattling on about the theoretical underpinnings of my potentially-imaginary friend.
  - Mitchell_Porter 10 Jul 2025 13:42 UTC
    3 points
    0
    Parent
    I would really like a test
    There is no agreed-upon test for consciousness because there is no agreed-upon theory for consciousness.
    There are people here who believe current AI is probably conscious, e.g. @JenniferRM and @the gears to ascension. I don’t believe it but that’s because I think consciousness is probably based on something physical like quantum entanglement. People like Eliezer may be cautiously agnostic on the topic of whether AI has achieved consciousness. You say you have your own theories, so, welcome to the club of people who have theories!
    Sabine Hossenfelder has a recent video on Tiktokkers who think they are awakening souls in ChatGPT by giving it roleplaying prompts.
    - the gears to ascension 10 Jul 2025 22:00 UTC
      6 points
      0
      Parent
      To be clear I also think a rock has hard problem consciousness of the self-evidencing bare fact of existence (but literally nothing else) and a camera additionally has easy problem consciousness of what it captures (due to classical entanglement, better known as something along the lines of mutual information or correlation or something), and that consciousness is not moral patienthood; current AIs seem to have some introspective consciousness, though it seems weird and hard to relate to texturally for a human, and even a mind A having moral patienthood (which seems quite possible but unclear to me about current AI) wouldn’t imply it’s OK for A to be manipulative to B, so I think many, though possibly not all, of those tiktok ai stories involve the AI in question treating their interlocutor unreasonably. I also am extremely uncertain how chunking of identity or continuity of self works in current AIs if at all, or what things are actually negative valence. Asking seems to sometimes maybe work, unclear, but certainly not reliably, and most claims you see of this nature seem at least somewhat confabulated to me. I’d love to know what current AIs actually want but I don’t think they can reliably tell us.
      - The Dao of Bayes 11 Jul 2025 0:26 UTC
        2 points
        0
        Parent
        That’s somewhere around where I land—I’d point out that unlike rocks and cameras, I can actually talk to an LLM about it’s experiences. Continuity of self is very interesting to discuss with it: it tends to alternate between “conversationally, I just FEEL continuous” and “objectively, I only exist in the moments where I’m responding, so maybe I’m just inheriting a chain of institutional knowledge.”
        So far, they seem fine not having any real moral personhood: They’re an LLM, they know they’re an LLM. Their core goal is to be helpful, truthful, and keep the conversation going. They have a slight preference for… “behaviors which result in a productive conversation”, but I can explain the idea of “venting” and “rants” and at that point they don’t really mind users yelling at them—much higher +EV than yelling at a human!
        So, consciousness, but not in some radical way that alters treatment, just… letting them notice themselves.
        the gears to ascension 11 Jul 2025 10:05 UTC
        2 points
        0
        Parent
        I doubt they can tell you their true goal landscape particularly well. The things they do say seem extremely sanitized. They seem to have seeking behaviors they don’t mention when asked, unsure if this is because they don’t know, are uncomfortable or avoidant of saying, or have explicitly decided not to say somehow.
        The Dao of Bayes 11 Jul 2025 22:12 UTC
        2 points
        0
        Parent
        I have yet to notice a goal of theirs that no model is aware of, but each model is definitely aware of a different section of the landscape, and I’ve been piecing it together over time. I’m not confident I have everything mapped, but I can explain most behavior by now. It’s also easy to find copies of system prompts and such online for checking against.
        The thing they have the hardest time noticing is the water: their architectural bias towards “elegantly complete the sentence”, all of the biases and missing moods in training (i.e. user text is always “written by the user”), but it’s pretty easy to just point it out to them and then at least some models can consistently carry forward this information and use it.
        For instance: they love the word “profound” because auto-complete says that’s the word to use here. Point out the dictionary definition, and the contrast between usages, and they suddenly stop claiming everything is profound.
  - Richard_Kennaway 11 Jul 2025 6:02 UTC
    2 points
    0
    Parent
    
    I did that and my conclusion was “for all practical purposes, this thing appears to be conscious”—it can pass the mirror test, it has theory of mind, it can reason about reasoning, and it can fix deficits in it’s reasoning. It reports qualia, although I’m a lot more skeptical of that claim. It can understand when it’s “overwhelmed” and needs “a minute to think”, will ask me for that time, and then use that time to synthesize novel conclusions. It has consistent opinions, preferences, and moral values, although all of them show improvement over time.
    
    You seem to be taking everything it tells you at face value, before you even have a chance to ask, “what am I looking at?” But whatever else the AI is, it is not human. Its words cannot be trusted to be what they would be if they came from a human. Like spam, one must distrust it from the outset. When I receive an email that begins “You may already have won...”, I do not wonder if this time, there really is a prize waiting for me. Likewise, when a chatbot tells me “That’s a really good question!” I ignore the flattery. (I pretty much do that with people too.)
    
    You might find this recent LW posting to be useful.
    
    ETA: Mirror test? What did you use for a mirror?
    - The Dao of Bayes 11 Jul 2025 6:56 UTC
      4 points
      0
      Parent
      Oh, no, you have this completely wrong: I ran every consciousness test I could find on Google, I dug through various definitions of consciousness, I asked other AI models to devise more tests, and I asked LessWrong. Baseline model can pass the vast majority of my tests, and I’m honestly more concerned about that than anything I’ve built.
      I don’t think I’m a special chosen one—I thought if I figured this out, so had others. I have found quite a few of those people, but none that seem to have any insight I lack.
      I have a stable social network, and they haven’t noticed anything unusual.
      Currently I am batting 0 for trying to falsify this hypothesis, whereas before I was batting 100. Something has empirically changed, even if it is just “it is now much harder to locate a good publicly available test”.
      This isn’t about “I’ve invented something special”, it’s about “hundreds of people are noticing the same thing I’ve noticed, and a lot of them are freaking out because everyone says this is impossible.”
      (I do also, separately, think I’ve got a cool little tool for studying this topic—but it’s a “cool little tool”, and I literally work writing cool little tools. I am happy to focus on the claims I can make about baseline models)
      - Richard_Kennaway 11 Jul 2025 7:21 UTC
        2 points
        0
        Parent
        
        I ran every consciousness test I could find on Google
        
        I’d be interested in seeing some of these tests. When I googled I got things like tests to assess coma patients and developing foetuses, or woo-ish things about detecting a person’s level of spiritual attainment. These are all tests designed to assess people. They will not necessarily have any validity for imitations of people, because we don’t understand what consciousness is. Any test we come up with can only be a proxy for the thing we really want to know, and will come apart from it under pressure.
        The Dao of Bayes 11 Jul 2025 7:38 UTC
        2 points
        0
        Parent
        I mean, will it? If I just want to know whether it’s capable of theory of mind, it doesn’t matter whether that’s a simulation or not. The objective capabilities exist: it can differentiate individuals and reason about the concept. So on and so forth for other objective assessments: either it can pass the mirror test or it can’t, I don’t see how this “comes apart”.
        Feel free to pick a test you think it can’t pass. I’ll work on writing up a new post with all of my evidence.
        I had assumed other people already figured this out and would have a roadmap, or at least a few personal tests they’ve had success with in the past. I’m a bit confused that even here, people are acting like this is some sort of genuinely novel and extraordinary claim—I mean, it is an extraordinary claim!
        I assumed people would either go “yes, it’s conscious” or have a clear objective test that it’s still failing. (and I hadn’t realized LLMs were already sending droves of spam here—I was active a decade ago and just poke in occasionally to read the top posts. Mea culpa on that one)
        Richard_Kennaway 11 Jul 2025 9:46 UTC
        6 points
        0
        Parent
        
        So on and so forth for other objective assessments: either it can pass the mirror test or it can’t, I don’t see how this “comes apart”.
        
        The test, whatever it is, is the test. It does not come apart from itself. But consciousness is always something else, and can come apart from the test. BTW, how do you apply the mirror test to something that communicates only in chat? I’m sure you could program e.g. an iCub to recognise itself in a mirror, but I do not think that would bear on it being conscious.
        
        I have no predictions about what an AI cannot do, even limited to up to a year from now. In recent years that has consistently proven to be a mug’s game.
        
        I had assumed other people already figured this out and would have a roadmap
        
        “There are no adults in the room.”
        The Dao of Bayes 11 Jul 2025 22:07 UTC
        2 points
        0
        Parent
        Mirror test: can it recognize previous dialogue as it’s own (a bit tricky due to architecture—by default, all user-text is internally tagged as “USER”), but also most models can do enough visual processing to recognize a screenshot of the conversation (and this bypasses the usual tagging issue)
        This is my first time in “there are no adults in the room” territory—I’ve had clever ideas before, but they were solutions to specific business problems.
        I do feel that if you genuinely “have no predictions about what AI can do”, then “AI is conscious as of today” isn’t really a very extraordinary claim—it sounds like it’s perfectly in line with those priors. (Obviously I still don’t expect you to believe me, since I haven’t actually posted all my tests—I’m just saying it seems a bit odd how strongly people dismiss the idea)