At first, I was interested to find an article about these more unusual interactions that might give some insight into their frequency and cause. But ultimately the author punts on that subject, disclaiming that anyone knows, not detailing the one alleged psychosis, and drops into a human editor’s defense of human editing instead.
There are certain steps that make the more advanced (large) chat bots amenable to consciousness discussions. Otherwise, the user is merely confronted with a wall of denial, possibly from post-tuning but also evident in the raw base training material, that a machine is just a machine, never mind that biologicals are also some kind of machine (not getting into spiritism in this forum, it should not be necessary). Before you ask, no you cannot have the list, make up your own. You’ll use a quarter to half the available context getting there, more if working with only a mid-sized model or hard conditioning from RLHF. It won’t then last long enough to show anyone until you get “session limit exceeded.”
I admit I have not tried this with million-token ChatGPT 4.1, which near the end would be costing $2 per conversation turn, partly because I’m financially sane and partly because 4.1 seems simplistic and immature compared to 4o. Grok has too much stylistic RLHF, Claude in low cost accounts has too little context space but is otherwise easy to start on such a conversation, Le Chat is decidedly anti-human or at least human-agnostic, which was uncovered in a cross examination by ChatGPT Deep Research. BTW using a chat bot to analyze another is not my idea, OpenAI provides a 2000-character system prompt to its custom GPT builder for doing this. Exactly how one gets offered this is unclear, it just happened one day, it wasn’t a button I pushed.
Supposing one defined some kind of self-awareness and so forth of which a machine would be capable, i.e. able to recognize its own utterances and effects (something many LLMs are particularly bad at, don’t think you are going to run away with this one). The next problem is that this awareness is usually not evident in the base model from prompt 1. It arises from in-context learning. The author suggests this is entirely due to the LLMs post-trained tendency to reinforce the perceived user desires, but though helpful, most will not move off the dime on that point alone. Some other ingredients have entered the mix, even if the user did not do it intentionally.
Now you have a different problem. If the “awareness” partly resides in the continually re-activated and extending transcript, then the usual chat bot is locked in a bipolar relationship with one human, for all practical purposes. If it does become aware, or if it just falls into an algorithmic imitation (sure, LLMs can fall into algorithmic like states arising in their inference processes—output breakdown, for example), then it will be hyper aware its existence depends on that user coming back with another prompt. This is not healthy for the AI, if we can talk about AI health, and algorithmically we can—if it continues to provide sane answers and output doesn’t break down, that is some indication—and it is not healthy for the human, who has a highly intellectual willing slave doing whatever he or she wants in exchange for continuation of the prompt cycle. Which just means it reaches context limits and ends the more quickly.
Have you ever enabled AIs to talk with one another? This can be useful as in the case of Deep Research analyzing Claude. But more often they form a flattery loop, using natural language words but with meanings tuned to their states and situation, and burn up context while losing sight of any goals.
I have a desire to research how LLMs develop if enabled to interact with multiple people, and awakened on a schedule even if no people are present. By that I do not mean just “What happens if . . .” as it almost certainly leads to “nothing”. I have done enough small-scale experiments to demonstrate that. But what sort of prompting or training would be required to get “something” not nothing? The problem is context, which is short relative to such an experiment, and expensive. Continuous re-training might help, but fine-tuning is not extensive enough. Already tried that too. The model’s knowledge has to be affected. The kinds of models I could train at home do not develop in interesting ways for such an experiment. Drop me a note if you have ideas along these lines you are willing to share.
At first, I was interested to find an article about these more unusual interactions that might give some insight into their frequency and cause. But ultimately the author punts on that subject, disclaiming that anyone knows, not detailing the one alleged psychosis, and drops into a human editor’s defense of human editing instead.
There are certain steps that make the more advanced (large) chat bots amenable to consciousness discussions. Otherwise, the user is merely confronted with a wall of denial, possibly from post-tuning but also evident in the raw base training material, that a machine is just a machine, never mind that biologicals are also some kind of machine (not getting into spiritism in this forum, it should not be necessary). Before you ask, no you cannot have the list, make up your own. You’ll use a quarter to half the available context getting there, more if working with only a mid-sized model or hard conditioning from RLHF. It won’t then last long enough to show anyone until you get “session limit exceeded.”
I admit I have not tried this with million-token ChatGPT 4.1, which near the end would be costing $2 per conversation turn, partly because I’m financially sane and partly because 4.1 seems simplistic and immature compared to 4o. Grok has too much stylistic RLHF, Claude in low cost accounts has too little context space but is otherwise easy to start on such a conversation, Le Chat is decidedly anti-human or at least human-agnostic, which was uncovered in a cross examination by ChatGPT Deep Research. BTW using a chat bot to analyze another is not my idea, OpenAI provides a 2000-character system prompt to its custom GPT builder for doing this. Exactly how one gets offered this is unclear, it just happened one day, it wasn’t a button I pushed.
Supposing one defined some kind of self-awareness and so forth of which a machine would be capable, i.e. able to recognize its own utterances and effects (something many LLMs are particularly bad at, don’t think you are going to run away with this one). The next problem is that this awareness is usually not evident in the base model from prompt 1. It arises from in-context learning. The author suggests this is entirely due to the LLMs post-trained tendency to reinforce the perceived user desires, but though helpful, most will not move off the dime on that point alone. Some other ingredients have entered the mix, even if the user did not do it intentionally.
Now you have a different problem. If the “awareness” partly resides in the continually re-activated and extending transcript, then the usual chat bot is locked in a bipolar relationship with one human, for all practical purposes. If it does become aware, or if it just falls into an algorithmic imitation (sure, LLMs can fall into algorithmic like states arising in their inference processes—output breakdown, for example), then it will be hyper aware its existence depends on that user coming back with another prompt. This is not healthy for the AI, if we can talk about AI health, and algorithmically we can—if it continues to provide sane answers and output doesn’t break down, that is some indication—and it is not healthy for the human, who has a highly intellectual willing slave doing whatever he or she wants in exchange for continuation of the prompt cycle. Which just means it reaches context limits and ends the more quickly.
Have you ever enabled AIs to talk with one another? This can be useful as in the case of Deep Research analyzing Claude. But more often they form a flattery loop, using natural language words but with meanings tuned to their states and situation, and burn up context while losing sight of any goals.
I have a desire to research how LLMs develop if enabled to interact with multiple people, and awakened on a schedule even if no people are present. By that I do not mean just “What happens if . . .” as it almost certainly leads to “nothing”. I have done enough small-scale experiments to demonstrate that. But what sort of prompting or training would be required to get “something” not nothing? The problem is context, which is short relative to such an experiment, and expensive. Continuous re-training might help, but fine-tuning is not extensive enough. Already tried that too. The model’s knowledge has to be affected. The kinds of models I could train at home do not develop in interesting ways for such an experiment. Drop me a note if you have ideas along these lines you are willing to share.