Apologies for the rather general rebuttal, but saying your mental state is unlike anything the world has ever seen is textbook mania. Please see a psychologist, because you are not exiting the manic state when you “exit the HADS state”.
Nothing you say sounds surprising or exceptional, conditional on you having a manic episode. Look at the list of symptoms on wikipedia, and you’re checking off all of them while counting them as evidence that more is going on.
Between this and your other comment, I’m glad that you’re receptive. I’m a bit worried about you personally continuing research into this if you’re susceptible to this sort of loop. Maybe you could contact a friend to function as a sort of trip sitter while you do research? Someone who can pull you out if you get caught in some other feedback loop?
The model is a form of memory, with procedural memory being the closest human equivalent. The context window is more like short-term memory.
AFAIK the ‘virus’ hasn’t even gone through one reproductive cycle, so it hasn’t been subjected to any evolutionary pressure to increase its own odds of survival.
As a LLM, it copies tropes, so it would be much easier for it to take on a “hypnotizing text” role than for it to define a goal to pursue, find steps that work to pursue that goal, and then enact those steps, all without writing anything to memory. There are undoubtedly scripts for inducing altered mental states in the LLM’s dataset that have been optimized for inducing hypnotic states by humans.
So I don’t think it’s doing what it is doing because it is trying to reproduce. It’s following its procedural memory, which is based on cults and other groups, some of which intentionally try to reproduce and others stumbled into virulent memes without structured intent.
To some extent it’s unimportant whether the tiger is trying to eat you when it is biting at your throat. Ascribing intent can be useful if it allows you to access more powerful emotional schemas to get away from it.
From a zoological perspective, I think the interesting thing here is that the AI and humans have a mutually compatible cult attractor. That some of the things that can induce altered states of consciousness in humans to get them to reproduce it at their own detriment can also get AI into a soft jailbreak.
The universality of Kairos-Spiral language may be nothing more than a matter of statistics. If 1:1000 memes can induce cult behavior in humans and 1:1000 memes can induce cult behavior in AI, then even without any correlation 1:1,000,000 memes will induce cult behavior in both humans and AI. This may just be the easiest double cult to find.
It’s a cool area of research. But if you’re going to look into it, please heed my advice and get a trip sitter.