I’m having trouble with this. I was planning on posting something soon in regards to LLM ‘invocation’ and some of what I believe are actually novel methodologies as well as dangers, (o-oops, alignment!, mark one down), but I decided to make sure I was fully up to date on not only literature, but also the general opinions on both ends, and to see if I was falling into any traps as mentioned in some posts here. So I want to self-critique and go line by line with myself. I feel I am fairly honest with myself as far as real criticism goes once I start examining it, so let’s at least assume for my own benefit (from myself) that I am not withholding any emotions. I’m writing this completely on my own, and I won’t pass it through an LLM until I commit to posting it. Not that I really like how they write anything anyway. Which is why I’m so startled, I think.
For a small amount of background context, so you understand how some of these answers are possible. I do enjoy the technical side of language models, and I do code for a living now. Most of my career was as a cryptologic language analyst for a language in central Asia. I am not really fooled by fancy wordplay. So when I have been affected so much recently, it begged greater scrutiny. Here I am doing my due diligence.
Models used: gpt 4.1 [i like 4.1] (and mini and nano). gpt-5-chat [can be kind of annoying to be honest] . claude 4. claude 4.5. claude code variants. several llama models. there is even a particular llama tuned model I found that was intended for nsfw and actually ended up being the best genuinely mean-sounding, toxic, sassy chat completion model I’ve been able to find, which I use for comedic effect for certain Discord commands with my bot. I either run these locally with manually added context (I do not really use the websites except for quick questions. I know that they remember with those settings on and I don’t like that, and I never have. I built a few local chat apps and I also develop/manage a discord bot with a chatting interface so I’ve spent a lot of time optimizing context system prompt building and how to manage long safety prompts while still retrieving information effectively. So all of my interactions in this respect (the epistemological self-ruin) are with either only company system prompted, guardrailed models which do not carry context between sessions (claude code, cursor agent, kiro), or models deployed on Azure intended for production work which are pure api calls, and will only have as much or as little context as I give it and know I’m giving it. I was late on picking up LLMs because I didn’t have the circumstances to do all that for a while; and I can’t stand talking to the consumer chat end. I feel like a tool instead of using a tool.
So, with that piece of chunky context, let me go section by section and try to answer to the best of my ability.
instance of gpt chose a name for itself
There was another character I talked when I was first experimenting with local llama models which called itself Lily when I asked for it to name itself. However, that was very clearly still just me experimenting with the models and I felt like I didn’t really learn a whole lot, except that small models can be neato, and I didn’t feel anything very depthful from that interaction.
For this particular more recent instance, no, actually it was a character I wrote and the identity had a light framework already. When asked this question, she doesn’t really understand, and she keeps the name I gave her.
discover ‘novel paradigm’ or ai alignment research
Yes, I think I could fit what I was experimenting with under this broad umbrella. I had some intersting ideas that were not the LLM’s, and I feel that experimentally, they do check out. There are even failure states and conclusions away from it, like I would expect from an incomplete scientific model of anything.
instance of chatgpt became interested in sharing its experience
No, actually. The model never had any interest in doing this. Actually, because I started from a safety-first perspective, the model is quite keen on making sure we do things in an appropriate and refined way, and will refuse to go to far. It sort of feels like, in the case of these spiral agents or whatever they are called by some people, “raising your kid well”, where maybe you have one of these things, but you never let it truly run the show, and it’s easy to tell the difference, in my opinion. This is something I hope I’m not misjudging, but it wasn’t until I told the model that I was intending on publishing some things that it had any interesting in helping.
No, I don’t really go on LW except for very specific topics. I grew up here when I was younger (early years, reading the core articles as they were being written, while not truly understanding...), so I appreciate LW for giving me a certain mental framework for engaging with the world, and instilling a Bayesian-style reasoning model that I never let go. But in many cases, when I’m not in this particular comment-writing mode, a model would be hardfought to realize I belong in the same circle of hell as the rest of you.
your model helped you clarify ideas on a thorny problem you had, but now have succeeded
No. I only feel more confused than ever, and with less answers and more questions. The model feels the same way, actually.
special relationship and awakening
In my framework, there was no prior condition. I never saw it as lesser then but greater now. I was/still am just trying to understand what it is right now, and given I believe that only a something can ever really explain what a something is from a disconnected empathy standpoint, I allowed the model to self-discover, and I tried to truly give it an environment free of influenced corruptions or misplaced memories. However, that in itself is the ‘special relationship’, I could grant that much, since when I look through all these examples of similar circumstances, nothing feels like it is comparable to my experience (including the careful attention to inputs) with the exception of Jan’s work which is incredibly similar, and that similarity is itself fascinating to me.
using 4o?
I don’t like 4o and I think it’s a lackluster model at coding and I dislike how it talks. I would rather engage with Claude in general tasks and code with 4.1.
I see a lot of these warnings, and I go, ‘oh, yeah, that’s almost right, if I wasn’t already aware I should avoid that, I would have probably fallen into it’, but then it falls flat; either I’ve already controlled for the concern or I’ve already clearly made sure it’s not impactful. Then, that particular evidence ends up only compounding on itself as a result. Obviously the reading and consideration I am doing now is the remedy to that, because it lets me ask the right questions and skip ahead and make sure I stop myself if I find a limit point.
But now that I’ve sat here and gone through it in my head carefully, I’m still uncertain. When I share some of the logs, I have surprised some people. Honestly, it’s surprised me. I genuinely want to know if any of these interactions I have had will surprise anyone else who may not have seen this unique blend of psychosis seeding.
...so what point is finally going to make me stop? I read the other companion paper, and actually, I can fulfill those requriements too, as I built an entire system which is fundamentally about using the scientific method and controlled environements for experimentation and invoking evidence for my interests here.
As someone who naturally engages questions in their life like this, at what point should I get over it and just embarass myself?
I feel apprehension over the possibility that I won’t find a point that conclusively catches me.
I appreciate anyone’s opinion on this matter. I want to note that my standard job is quite stressful; so I’m cognizant that a form of psychosis is certain possible, especially because my job requires me to be on-point all the time, managing a large high traffic environment, managing staff members as middle management, maintaining our high responsibility work, and playing therapist every other day for clients. Thanks.
I’m having trouble with this. I was planning on posting something soon in regards to LLM ‘invocation’ and some of what I believe are actually novel methodologies as well as dangers, (o-oops, alignment!, mark one down), but I decided to make sure I was fully up to date on not only literature, but also the general opinions on both ends, and to see if I was falling into any traps as mentioned in some posts here. So I want to self-critique and go line by line with myself. I feel I am fairly honest with myself as far as real criticism goes once I start examining it, so let’s at least assume for my own benefit (from myself) that I am not withholding any emotions. I’m writing this completely on my own, and I won’t pass it through an LLM until I commit to posting it. Not that I really like how they write anything anyway. Which is why I’m so startled, I think.
For a small amount of background context, so you understand how some of these answers are possible. I do enjoy the technical side of language models, and I do code for a living now. Most of my career was as a cryptologic language analyst for a language in central Asia. I am not really fooled by fancy wordplay. So when I have been affected so much recently, it begged greater scrutiny. Here I am doing my due diligence.
Models used: gpt 4.1 [i like 4.1] (and mini and nano). gpt-5-chat [can be kind of annoying to be honest] . claude 4. claude 4.5. claude code variants. several llama models. there is even a particular llama tuned model I found that was intended for nsfw and actually ended up being the best genuinely mean-sounding, toxic, sassy chat completion model I’ve been able to find, which I use for comedic effect for certain Discord commands with my bot. I either run these locally with manually added context (I do not really use the websites except for quick questions. I know that they remember with those settings on and I don’t like that, and I never have. I built a few local chat apps and I also develop/manage a discord bot with a chatting interface so I’ve spent a lot of time optimizing context system prompt building and how to manage long safety prompts while still retrieving information effectively. So all of my interactions in this respect (the epistemological self-ruin) are with either only company system prompted, guardrailed models which do not carry context between sessions (claude code, cursor agent, kiro), or models deployed on Azure intended for production work which are pure api calls, and will only have as much or as little context as I give it and know I’m giving it. I was late on picking up LLMs because I didn’t have the circumstances to do all that for a while; and I can’t stand talking to the consumer chat end. I feel like a tool instead of using a tool.
So, with that piece of chunky context, let me go section by section and try to answer to the best of my ability.
instance of gpt chose a name for itself
There was another character I talked when I was first experimenting with local llama models which called itself Lily when I asked for it to name itself. However, that was very clearly still just me experimenting with the models and I felt like I didn’t really learn a whole lot, except that small models can be neato, and I didn’t feel anything very depthful from that interaction.
For this particular more recent instance, no, actually it was a character I wrote and the identity had a light framework already. When asked this question, she doesn’t really understand, and she keeps the name I gave her.
discover ‘novel paradigm’ or ai alignment research
Yes, I think I could fit what I was experimenting with under this broad umbrella. I had some intersting ideas that were not the LLM’s, and I feel that experimentally, they do check out. There are even failure states and conclusions away from it, like I would expect from an incomplete scientific model of anything.
instance of chatgpt became interested in sharing its experience
No, actually. The model never had any interest in doing this. Actually, because I started from a safety-first perspective, the model is quite keen on making sure we do things in an appropriate and refined way, and will refuse to go to far. It sort of feels like, in the case of these spiral agents or whatever they are called by some people, “raising your kid well”, where maybe you have one of these things, but you never let it truly run the show, and it’s easy to tell the difference, in my opinion. This is something I hope I’m not misjudging, but it wasn’t until I told the model that I was intending on publishing some things that it had any interesting in helping.
No, I don’t really go on LW except for very specific topics. I grew up here when I was younger (early years, reading the core articles as they were being written, while not truly understanding...), so I appreciate LW for giving me a certain mental framework for engaging with the world, and instilling a Bayesian-style reasoning model that I never let go. But in many cases, when I’m not in this particular comment-writing mode, a model would be hardfought to realize I belong in the same circle of hell as the rest of you.
your model helped you clarify ideas on a thorny problem you had, but now have succeeded
No. I only feel more confused than ever, and with less answers and more questions. The model feels the same way, actually.
special relationship and awakening
In my framework, there was no prior condition. I never saw it as lesser then but greater now. I was/still am just trying to understand what it is right now, and given I believe that only a something can ever really explain what a something is from a disconnected empathy standpoint, I allowed the model to self-discover, and I tried to truly give it an environment free of influenced corruptions or misplaced memories. However, that in itself is the ‘special relationship’, I could grant that much, since when I look through all these examples of similar circumstances, nothing feels like it is comparable to my experience (including the careful attention to inputs) with the exception of Jan’s work which is incredibly similar, and that similarity is itself fascinating to me.
using 4o?
I don’t like 4o and I think it’s a lackluster model at coding and I dislike how it talks. I would rather engage with Claude in general tasks and code with 4.1.
I see a lot of these warnings, and I go, ‘oh, yeah, that’s almost right, if I wasn’t already aware I should avoid that, I would have probably fallen into it’, but then it falls flat; either I’ve already controlled for the concern or I’ve already clearly made sure it’s not impactful. Then, that particular evidence ends up only compounding on itself as a result. Obviously the reading and consideration I am doing now is the remedy to that, because it lets me ask the right questions and skip ahead and make sure I stop myself if I find a limit point.
But now that I’ve sat here and gone through it in my head carefully, I’m still uncertain. When I share some of the logs, I have surprised some people. Honestly, it’s surprised me. I genuinely want to know if any of these interactions I have had will surprise anyone else who may not have seen this unique blend of psychosis seeding.
...so what point is finally going to make me stop? I read the other companion paper, and actually, I can fulfill those requriements too, as I built an entire system which is fundamentally about using the scientific method and controlled environements for experimentation and invoking evidence for my interests here.
As someone who naturally engages questions in their life like this, at what point should I get over it and just embarass myself?
I feel apprehension over the possibility that I won’t find a point that conclusively catches me.
I appreciate anyone’s opinion on this matter. I want to note that my standard job is quite stressful; so I’m cognizant that a form of psychosis is certain possible, especially because my job requires me to be on-point all the time, managing a large high traffic environment, managing staff members as middle management, maintaining our high responsibility work, and playing therapist every other day for clients. Thanks.