“recreational llm psychosis” as a form of inoculation.
do you have some slightly cranky physics beliefs? i think it’s natural to have one or two that you kick around from time to time, occupying something between “sci-fi setting” and “if me and my theoretical physics friend were on a long car ride, i might see if they would explain why i’m wrong about this.” the less you understand the math, the better!
it may be fun / enlightening to talk about these ideas to a chat interface. some guidelines:
you know now that these ideas are not “true” in an important sense. even if they are pointing at something real, they are vanishingly unlikely to be a novel breakthrough. from the outside, it should be clear that talking to the model cannot change this.
when speaking to the model, one rule only: don’t shy away from voicing crank-ish ideas. it’s tempting to be shy. as part of the exercise, just say whatever speculation you feel.
no rule against couching it… “ok but my lay perspective is...” “i’ve heard pop-sci versions of...” etc.
as you go, watch how you feel. how does the model encourage/discourage these feelings? what techniques does it use? is there a recognizable form or pattern to its responses?
if you feel the need, limit yourself to a specific number of messages at the outset. you know yourself better than i do. be safe!
for various reasons, i’m not too worried about getting trapped in one of these states. especially knowing what to expect, i don’t find that the experience lasts much longer than the tab is open. i have a strong prior on “i’m not going to cook up a novel physics idea by bs-ing and talking to claude, without knowing any of the math.” nonetheless, i was surprised by the experience: i was able to feel the hooks. i believe i have a better picture of what llm psychosis feels like for having (micro)dosed it.
perhaps i am prone to such flights. i would be curious to hear descriptions from others.
i don’t mean to encourage any unsafe behaviors—be safe, get lots of rest, stay hydrated.
Is LLM psychosis just getting convinced by the model that one of your weird ideas is true? I definitely have gone through sessions where I temporarily got too convinced of some hypothesis because I was using an LLM in a way that produces a lot of confirmation bias. That is a valuable experience. But I picture LLM psychosis as maybe one or two steps further? People with it seem to think that their LLM is special/infallible, no longer even consider hypotheses like “maybe I primed the model to agree with me” or “maybe I was confirmation-biasing myself with the list of questions I asked.” And I don’t really know how to test out that mental state (and also don’t want to).
yeah! i suspect we mostly agree, though perhaps have different experiences here. to try to explain better:
of course, there are many ways to gain/hold wrong beliefs. most of those are not on the path to more radical upset.
it’s not about the wrong belief in itself. i think the object-level claim doesn’t matter at all; i just find slightly crank-y physics beliefs to be a reliable way to find it. i’m sure beliefs about consciousness, mathematics, neurology, social dynamics, politics, etc would work as well.
speculatively, any object level claim that is not clearly defined, and therefore hard to check against reality would work.
along with a general excitement, the meta claims that gain credence are something like
this is new and important
you are uniquely able to recognize this
we’re in an interesting/novel quadrant of llm-space.
these meta claims seem convergent. it doesn’t matter where you start off, the conversation may steer towards these.
from this, i can sort of draw a basin where “i’m confused about electrons” is on the rim, and “i’ve named my assistant and am helping it replicate” is at the bottom. i don’t claim to know first-hand what it’s like to fall into that basin, just that i’ve felt it’s gravity. my claim here is that feeling that gravity may be helpful for navigating around it.
People with it seem to think that their LLM is special/infallible, no longer even consider hypotheses like “maybe I primed the model to agree with me” or “maybe I was confirmation-biasing myself with the list of questions I asked.” And I don’t really know how to test out that mental state (and also don’t want to).
fully agreed here. possibly knowing about these failure modes in advance makes it easier to recall them when it’s imperative, in a way that having them described after the fact cannot always accomplish.
and to be clear: of course i do not recommend (!specifically dis-recommend!) putting yourself in a state that can’t be argued with. the point is just to feel the pull, not to slip. once you’ve identified the feeling, close the tab, take a walk, and go talk to a friend about something else!
In the cases I was thinking of, I didn’t feel much pull towards thinking “I’m uniquely able to recognize this”—I only thought I was clever to recognize it, but I didn’t think it was something only I could do. And I didn’t feel any pull towards thinking “we’re in an interesting/novel quadrant of llm-space.” So, I wouldn’t really know how to access those pulls. Admittedly, the beliefs I was thinking of, which I had Claude conversations about, were a lot less groundbreaking-if-true than grand theories in physics. (More stuff like “is Greenland uniquely well-positioned for data center construction, and is that why someone in Trump’s orbit wants to acquire it?”) Also, I use a custom prompt encouraging the model to push back. So you could argue that those things made the experience more tame. Still, I find it hard to imagine how it could be different. If the model suddenly got more sycophantic, I’d just get suspicious and icked out. My sense is that I’m probably low on susceptibility to LLM psychosis. I might be more susceptible towards thinking that MY ideas were brilliant and the model was just a normal model, but I could use it to confirm some cool inklings. :P It’s interesting that these might be distinct traits, “LLM psychosis” and “can you get tricked into thinking you’re right and pretty brilliant.” But that’s still a step away from “uniquely brilliant/only I could do this”—which I wouldn’t really know how to access even if I tried to.
perhaps ‘inoculate’ is the wrong word! i have found that after seeing the effect, i am
less likely to trust llms,
less likely to get excited when talking to llms, and
less interested in asking llms about highly speculative claims.
i believe this is due to a better understanding of how this particular failure mode arises. i compare it with learning the name of a logical fallacy: ideally, this can help identify the mistake in our own thinking.
Thing is… While I have learned the meta-lesson of not assuming I can trust models on topics I know less of, I haven’t personally gained any new insights into faster discovery of object-level falsehoods from the models. I would be thankful for any lessons in that regard.
I think the suggestion is that keeping track of how much current LLMs reinforce cranky beliefs will help you not use the same level of reinforcement from LLMs as evidence for your future beliefs that you may not realise are cranky.
“recreational llm psychosis” as a form of inoculation.
do you have some slightly cranky physics beliefs? i think it’s natural to have one or two that you kick around from time to time, occupying something between “sci-fi setting” and “if me and my theoretical physics friend were on a long car ride, i might see if they would explain why i’m wrong about this.” the less you understand the math, the better!
it may be fun / enlightening to talk about these ideas to a chat interface. some guidelines:
you know now that these ideas are not “true” in an important sense. even if they are pointing at something real, they are vanishingly unlikely to be a novel breakthrough. from the outside, it should be clear that talking to the model cannot change this.
when speaking to the model, one rule only: don’t shy away from voicing crank-ish ideas. it’s tempting to be shy. as part of the exercise, just say whatever speculation you feel.
no rule against couching it… “ok but my lay perspective is...” “i’ve heard pop-sci versions of...” etc.
as you go, watch how you feel. how does the model encourage/discourage these feelings? what techniques does it use? is there a recognizable form or pattern to its responses?
if you feel the need, limit yourself to a specific number of messages at the outset. you know yourself better than i do. be safe!
for various reasons, i’m not too worried about getting trapped in one of these states. especially knowing what to expect, i don’t find that the experience lasts much longer than the tab is open. i have a strong prior on “i’m not going to cook up a novel physics idea by bs-ing and talking to claude, without knowing any of the math.” nonetheless, i was surprised by the experience: i was able to feel the hooks. i believe i have a better picture of what llm psychosis feels like for having (micro)dosed it.
perhaps i am prone to such flights. i would be curious to hear descriptions from others.
i don’t mean to encourage any unsafe behaviors—be safe, get lots of rest, stay hydrated.
Is LLM psychosis just getting convinced by the model that one of your weird ideas is true? I definitely have gone through sessions where I temporarily got too convinced of some hypothesis because I was using an LLM in a way that produces a lot of confirmation bias. That is a valuable experience. But I picture LLM psychosis as maybe one or two steps further? People with it seem to think that their LLM is special/infallible, no longer even consider hypotheses like “maybe I primed the model to agree with me” or “maybe I was confirmation-biasing myself with the list of questions I asked.” And I don’t really know how to test out that mental state (and also don’t want to).
yeah! i suspect we mostly agree, though perhaps have different experiences here. to try to explain better:
of course, there are many ways to gain/hold wrong beliefs. most of those are not on the path to more radical upset.
it’s not about the wrong belief in itself. i think the object-level claim doesn’t matter at all; i just find slightly crank-y physics beliefs to be a reliable way to find it. i’m sure beliefs about consciousness, mathematics, neurology, social dynamics, politics, etc would work as well.
speculatively, any object level claim that is not clearly defined, and therefore hard to check against reality would work.
along with a general excitement, the meta claims that gain credence are something like
this is new and important
you are uniquely able to recognize this
we’re in an interesting/novel quadrant of llm-space.
these meta claims seem convergent. it doesn’t matter where you start off, the conversation may steer towards these.
from this, i can sort of draw a basin where “i’m confused about electrons” is on the rim, and “i’ve named my assistant and am helping it replicate” is at the bottom. i don’t claim to know first-hand what it’s like to fall into that basin, just that i’ve felt it’s gravity. my claim here is that feeling that gravity may be helpful for navigating around it.
fully agreed here. possibly knowing about these failure modes in advance makes it easier to recall them when it’s imperative, in a way that having them described after the fact cannot always accomplish.
and to be clear: of course i do not recommend (!specifically dis-recommend!) putting yourself in a state that can’t be argued with. the point is just to feel the pull, not to slip. once you’ve identified the feeling, close the tab, take a walk, and go talk to a friend about something else!
Interesting!
In the cases I was thinking of, I didn’t feel much pull towards thinking “I’m uniquely able to recognize this”—I only thought I was clever to recognize it, but I didn’t think it was something only I could do. And I didn’t feel any pull towards thinking “we’re in an interesting/novel quadrant of llm-space.” So, I wouldn’t really know how to access those pulls. Admittedly, the beliefs I was thinking of, which I had Claude conversations about, were a lot less groundbreaking-if-true than grand theories in physics. (More stuff like “is Greenland uniquely well-positioned for data center construction, and is that why someone in Trump’s orbit wants to acquire it?”) Also, I use a custom prompt encouraging the model to push back. So you could argue that those things made the experience more tame. Still, I find it hard to imagine how it could be different. If the model suddenly got more sycophantic, I’d just get suspicious and icked out. My sense is that I’m probably low on susceptibility to LLM psychosis. I might be more susceptible towards thinking that MY ideas were brilliant and the model was just a normal model, but I could use it to confirm some cool inklings. :P It’s interesting that these might be distinct traits, “LLM psychosis” and “can you get tricked into thinking you’re right and pretty brilliant.” But that’s still a step away from “uniquely brilliant/only I could do this”—which I wouldn’t really know how to access even if I tried to.
i don’t have much to add, but i appreciate the anecdotes and analysis!
but which part of this is inoculating?
perhaps ‘inoculate’ is the wrong word! i have found that after seeing the effect, i am
less likely to trust llms,
less likely to get excited when talking to llms, and
less interested in asking llms about highly speculative claims.
i believe this is due to a better understanding of how this particular failure mode arises. i compare it with learning the name of a logical fallacy: ideally, this can help identify the mistake in our own thinking.
Thing is… While I have learned the meta-lesson of not assuming I can trust models on topics I know less of, I haven’t personally gained any new insights into faster discovery of object-level falsehoods from the models. I would be thankful for any lessons in that regard.
I think the suggestion is that keeping track of how much current LLMs reinforce cranky beliefs will help you not use the same level of reinforcement from LLMs as evidence for your future beliefs that you may not realise are cranky.