Lukas_Gloor

Karma: 4,150

Lukas_Gloor 15 May 2026 10:12 UTC
2 points
0
in reply to: leogao’s comment on: leogao’s Shortform
From this post on multiverse-wide cooperation (nowadays people call it ECL—evidential cooperation in large worlds):
5. MSR represents a shift in one’s ontology; it is not just some “trick” we can attempt for extra credit
The line of reasoning employed in MSR is very similar to the reasoning employed in anthropic decision problems. For comparison, take the idea that there are numerous copies of ourselves across many ancestor simulations. If we thought this was the case, reasoning anthropically as though we control all our copies at once could, for certain decisions, change our prioritization: If my decision to reduce short-term suffering plays out the same way in millions of short-lived, simulated versions of earth where focusing on the far future is impossible to pay out, I have more reason to focus on short-term suffering than I thought.
MSR applies a similar kind of reasoning where we shift our thinking from being a single instance of something to thinking in terms of deciding for an entire class of agents. MSR is what follows when one extends/generalizes the anthropic/UDT slogan “Acting as though you are all your (subjectively identical) copies at once” to “Acting as though you are all copies of your (subjective probability distribution over your) decision algorithm at once.”
Rather than identifying solely with one’s subjective experiences and one’s goals/values, MSR also involves “identifying with” – on the level of predicting consequences relevant to one’s decision – one’s general decision algorithm. If the assumptions behind MSR are sound, then deciding not to change one’s actions based on MSR has to cause an update in one’s world model, an update about other agents in one’s reference class also not cooperating. So the underlying reasoning that motivates MSR is something that has to permeate our thinking about how to have an impact on the world, whether we decide to let it affect our decisions or not. MSR is a claim about what is rational to do given that our actions have an impact in a broader sense than we may initially think, spanning across all instances of one’s decision algorithm. It changes our EV calculations and may in some instances even flip the sign – net positive/negative – of certain interventions. Ignoring MSR is therefore not necessarily the default, “safe” option.
I’m not sure this section will convince, but the point is that it’s hard to avoid. Even if you decide to ignore these sorts of arguments, that very action (the ignoring) should update what you think aliens will do. So then, why are you ignoring it?

Lukas_Gloor 12 May 2026 21:04 UTC
5 points
3
in reply to: XelaP’s comment on: AllAmericanBreakfast’s Shortform
It’s 7% per year (or 9% per year or whatever the figure is, the important part being PER YEAR). So if we resolve in 2 months, surely the opportunity costs are lower and that will let us get more precision on lower probabilities?

Or am I missing something?

Lukas_Gloor 12 May 2026 20:34 UTC
4 points
0
in reply to: XelaP’s comment on: AllAmericanBreakfast’s Shortform
What I meant was shorter lockup/resolution period makes the market still accurate for lower probability estimates. You’re right that below market returns it becomes unable to distinguish anything anymore.
With pandemic risks, it’s a bit stupid if the market fails to differentiate between 0% and 8%, but if you can get that down towards it being able to differentiate between “anywhere below 3%” and “likely 3+%”, then that’s starting to become useful.

Lukas_Gloor 12 May 2026 16:59 UTC
5 points
0
in reply to: DirectedEvolution’s comment on: AllAmericanBreakfast’s Shortform
Incentives-wise, you could correct for this with shorter resolution times, right? Like instead of resolving a pandemic question in 7 months, ask if there are more entirely new cases outside of South America 2 months from now compared to 1 month from now. (2 months is still kind of long, so with clever question engineering one could probably find something even better that is relevant to what we’re interested in.)

Lukas_Gloor 20 Apr 2026 0:53 UTC
12 points
4
in reply to: Jiro’s comment on: Vladimir Putin’s CEV is probably not that bad
Quokkadom definitely, and in my case another driver of disagreement is also something about other posters IMO having too much faith in people’s interest (and ability) in “deep reflection”. Like, if someone’s values don’t currently seem good to you, it’s quite a strong prediction that the person will shift towards good values once the person gets access to future technology and AI assistants. Do people really have a gears-level model of what “deep reflection” would look like in practice post-singularity, from which they can draw confident predictions? Or do they have emotional attachment and halo effects around ideas like the power of rationality/thinking, and somehow those being linked to “competence” so that people are especially optimistic about powerful people (since they’ve shown competence in getting into power), even though we have seen many many examples where people with competence at gaining and staying in power are absolutely awful at philosophical thinking, LW-style rationality, or being interested in the well-being of others.
Calling exasperated reactions to the post “tribal” feels like too cheap of an explanation. I don’t know many rationalists who spend a lot of their attention thinking about how bad Putin is. (I’d expect more tribalism if the example had been Trump.) People get triggered when something they care about is under attack. “Who would you be okay having power” is a question with some real-life relevance (even if it’s often discussed in the abstract and with hypotheticals) and if you see someone advance a take that you think would be very bad, surely that’s a bit of a threat to the expected impact of the community that you’re in? So, I think people care about this not out of “tribalism” but because it’s the nature of “Who would you be okay as a leader/in power” that ppl often feel invested.

Lukas_Gloor 19 Apr 2026 1:14 UTC
4 points
2
in reply to: habryka’s comment on: Dario Amodei – The Adolescence of Technology
Okay, then we are closer together on that other discussion, which is good! I don’t feel too attached to any specific frequency claim, for what it’s worth. (I probably think it’s more frequent than you do, but it’s not like I have a specific percentage in mind that I am confident in. (Edited to add: Unfortunately for the world, I also think these things often correlate with power-seeking and can in destabilized situations even be conducive to it. But then it’s also a question of degree, I think it’s quite unlikely that Putin is at the extreme end of sadism, but at the same time I think it’s likely even just on priors that he he’s at least moderately high on it.))

What is a bit weird though, and what made me think you were maybe backpedalling a bit just now, is that in the other thread you disagree-voted on an example of something particularly evil/sadistic that I gave. If you indeed think extreme sadists exist (even if rare) then surely someone who has looked into this topic a bit, will be able to think of a fitting example of it. So, rather than disagreevoting my point without even checking, you could have said something like “sure, maybe that’s an exmaple of extreme sadism, but the world has 8 billion people and some of them will have all kinds of weird psychologies, so I’m interested also in making a claim about (in)frequency here.”

Lukas_Gloor 19 Apr 2026 0:24 UTC
10 points
8
in reply to: Lukas_Gloor’s comment on: Dario Amodei – The Adolescence of Technology
I was confused at the time why habryka disagree-emojied the last sentence in my last paragraph, but it all makes sense now. As I learned in the comment threads here, habryka doesn’t believe extreme sadists exist.

I wonder if that’s the crux this is based on? Even if it isn’t, wouldn’t it be convenient if the person who thinks it’s the right/principled stance to give influence to everyone without safeguards also just happens to have this contrarian belief about the non-existence of extreme sadists?

I know it’s not always easy to introspectively disentangle what individual components high-level takes are based on, so maybe back when you made this disagreevote you didn’t feel like you were doing so for fairly contrarian reasons. But I feel like once you notice that your takes are based substantially on something rather contrarian, it becomes good practice to flag it as such! Because without it, a lot of people will think you’re credible (high karma and standing) and have thoughtful opinions about integrity and moral action, so maybe they should update in your direction when you disagree and imply that something someone else writes is undemocratic or whatever. But if you had flagged that much of the impetus for your opinion was informed by not believing that extreme sadists exist, many people who would otherwise update somewhat in your direction will just go “oh, habryka seems oddly misinformed here for some reason, I guess I’ll just ignore his take on this specific topic.”

Lukas_Gloor 18 Apr 2026 21:28 UTC
10 points
8
in reply to: habryka’s comment on: Vladimir Putin’s CEV is probably not that bad
I have met Christians like that and I don’t see why those numbers would be too high. We live in extreme filter bubbles.

Lukas_Gloor 18 Apr 2026 20:39 UTC
11 points
3
in reply to: habryka’s comment on: Vladimir Putin’s CEV is probably not that bad
This is the definition of ideological fanatic as far as I can tell! You are telling me 200-250 million Christians worldwide fit this description?
I agree that’s too high, not on that strict description of fanaticism. But David also writes “For brevity, we focus here on support for ideological violence as the best proxy for ideological fanaticism.” So you may be right that there’s a bit of motte-and-bailey going on with who gets counted as “fanatical”. But I think the post is clear about what it is or isn’t saying.

And just to be clear, I think the true numbers would probably be somewhat shocking even for the strict/extreme definition. Like maybe a quarter of those 200-250 million, in my estimate. Africa still has witch burnings, some places in the US are very religious and almost every family there has this one (extended-)family member who is really fanatical, and I was mostly thinking about US big cities just now but in rural populations it’s probably even more pronounced.

Lukas_Gloor 18 Apr 2026 20:06 UTC
8 points
4
in reply to: habryka’s comment on: Vladimir Putin’s CEV is probably not that bad
If you end up looking into it (e.g., you could talk to Claude starting with a prompt and our recent comments here) and change your mind (or not), please let me know. I suggest doing so on a day where you’re not necessarily planning to get a lot of work done, because reading about this stuff really weighs you down. Unfortunately the sadism component is on-the-nose.

I agree with you that it’s often the case that the media paints people as evil where other stuff is going on rather than just “evil personality full stop” (like the intense hatred towards mothers who harm their children when they suffer from extreme postnatal depression, or have mental problems that generate Munchausen by proxy expressions). But sometimes people really like torturing others for fun and there’s ample documentation of that sort of personality not just in the sextortion cases I alluded to, but throughout history when you read about places that used torture (not even just the victims saying that the torturers seemed to enjoy it, sometimes the torturers write about it themselves).

I wonder if maybe there’s a selection effect where the media kind of stops reporting on things that get too shocking, meaning where extreme sadism is involved, so if you just go by shocking media examples, it’s possible to miss the tails. But it’s different with history where historians often go to great lengths highlighting how bad the atrocities were in some times and places.

Lukas_Gloor 18 Apr 2026 19:44 UTC
13 points
5
in reply to: habryka’s comment on: Vladimir Putin’s CEV is probably not that bad
Thanks for sharing that take, which I find largely quite bizarre and surprising. I continue to think those posts are super valuable.

I can understand finding the negative(-leaning) utilitarianism codedness of the writings annoying, but I don’t see why you think it makes good-faith discussion a lot harder. From my perspective, a lot of writings on LW are “yay, hurrah team human”-coded in a way that annoys the crap out of me and makes me want to punch things, but it’s not like that means I can’t get valuable things out of the writings or have to treat the posters here as necessarily adversarial.
I am confident the psychological characterization in paragraphs like the above is wrong. “Defends pre-existing ideology at all costs”. Come on, no, of course not “all costs”. Indeed a huge fraction of people caught up in these ideologies end up deconverting or drastically mellowing their views when their social context changes.
The subject of the sentence you put in quotation marks was “The fanatic”—as in, “the archetypal example of the fanatic.” This sounds to me like more of a writing style issue. Like, it wouldn’t even occur to me to assume that the post is saying that every believer of a harmful ideology is like that. More that there’s a fanaticism attractor and that people at the center of it really do approach that described extreme—which I think is true? I guess that’s the thing you’re contesting, but I feel like history contains some examples of atrocities that are hard to explain without choosing at least one of the following: either some people can get incredibly fanatical, or some people are sadistic/evil and may use fanaticism/ideology as a cover. Either way, at least one of the posts’ messages must be true?

There are some far-future relevance speculations where my model of what is likely to happen with AI is different from David’s, so I think it’s just much more likely that AIs will take over and it won’t matter what the humans who built it were like. But that doesn’t really invalidate too much—I mean part of the posts are also about what sort of qualities we wouldn’t want to see in AIs that we build. On your point about how AI-aided moral reflection and overcoming resource scarcity would reduce fanaticism or other bad consequences from “bad values,” I think that sort of point is underappreciated in some EA circles, but it’s not like it’s obvious that this is what’s going to happen. I think the posts we’re discussing engage well with reasons why reflection might not solve all the problems.

Lukas_Gloor 18 Apr 2026 19:08 UTC
4 points
2
in reply to: Ben Pace, the Vacationing Vagabond’s comment on: Vladimir Putin’s CEV is probably not that bad
“He says it’s likely that all his emotions and empathy are fake!”

I don’t get why you think that’s such a big deal. A lot of people are like that. My guess is something like 5%. A lot of people who are like that don’t admit it. Surely dictators are like 50% likely to be like that just on priors, and then you can add KGB history for Putin.

I feel like you’re overupdating based on SBF admitting something, while not inferring things about Putin (or other dictators) based on past behavior and based on the demands of their role (and getting there).

I mean I understand not knowing much about Putin specifically; if I’m honest, I also don’t know much/couldn’t give you detailed examples, but I’m actually somewhat familiar with KGB history due to an interest in Cold War spy stories, and it’s been said that Putin was an exemplary KGB specimen or whatever, so, we can probably infer with 99.9% confidence that he thinks “ends justify the means” too, because imagine being in the KGB and voicing deontological objections to your superiors, do you think you’re going to rise up through the ranks?

BTW my model of that sort of personality is that when someone says the things that SBF wrote about himself, it’s still compatible with them having genuine feelings of fondness (though somewhat faint rather than all-consuming) for a person (or animal) or two in their lives. And maybe that’s why habryka thinks these personality traits are overpainted/demonized. I even agree that some people might be too categorically negative about the idea that some people on the sociopathy/psychopathy spectrum may actually be alright at least if you’re able to contain some of their bad patterns (like lying). But for the most part, I’d say it still makes for bad leadership and stewardship of others when someone is like that even in the more benign expressions, and we haven’t even gotten into the topic of extreme sadism and tails generally yet, which by the looks of it (my comment here being disagree-voted and the lack of logic in “it’s a spectrum” arguments also pointed out by Steven Byrnes, and surrounding discussion there generally) some people here seem to be in denial about. I don’t understand what’s going on.

Edit: I looked into figures a bit and I think it’s more like 3% for all population, but 5% for men specifically feels like the right estimate to me. And this is assuming “blunted emotionality” rather than “literally has no emotions ever”.

Lukas_Gloor 18 Apr 2026 18:09 UTC
2 points
0
in reply to: habryka’s comment on: Vladimir Putin’s CEV is probably not that bad
Oh, I see. I mean, I don’t quite share your confidence that it’s outlandish to think that a dictator used to sycophants around them would somehow end up in a situation where he’d do a bold thing that backfires against his interest, even if the AI advisers were genuinely aligned to what he truly wants (meaning they’d be trying not to be sycophantic too much).

(I guess if I pointed out that the advisers in reality will probably be very sycophantic, you’d say “sure but then the AIs will be misaligned in general and probably take over or it will lead to gradual disempowerment stuff,” and yup, I agree with that.)

But yeah, we were talking past each other and I expressed myself poorly with “”Putin would be ambitious enough to establish contact if that isn’t a good idea.” Basically, I was thinking of a situation where the AI advisers would say that establishing acausal contact might go poorly for civs that feel like they have a lot to lose and are at least somewhat risk-averse when it comes to some really bad scenarios, and don’t have much to gain from trying to amass more influence just for the sake of it. And if Putin cares less about these concerns, it could also be genuinely rational for him to risk it. So, in my picture, I was mainly imagining it being a bad idea by our lights or by somewhat downside-focused lights.

(I could be wrong and maybe acausal contact is net positive for almost all civs, I just lean pessimistic with these things, but who really knows.)

Lukas_Gloor 18 Apr 2026 17:55 UTC
5 points
5
in reply to: Ben Pace, the Vacationing Vagabond’s comment on: Vladimir Putin’s CEV is probably not that bad
Reading your first comment above, I reacted as follows:

”What is going on, surely Putin is on the sociopathy/psychopathy spectrum too and not as endearing as SBF with effective altruism as an autistic special interest, so how is this even a question/comparison?”

(With SBF’s CEV I’d admittedly be quite concerned about the greed/recklessness inherent to his philosophy of utilitarianism and risk-taking, and I actually think that sort of thing could backfire uniquely-badly with AIs optimizing everything. But other than that, personality-wise, I just want to flag that while it’s probably always net bad to have people high on the sociopathy/psychopathy spectrum in leadership positions, there’s still a huge difference between the ones where your main concern is only large-scale fraud, vs the ones where the concern is unnecessary wars, genocide, and torture camps.)

I wonder, have many people on here not read the many examples of how harmful certain types of personality can be, for instance in this post and this one?

Lukas_Gloor 18 Apr 2026 17:34 UTC
2 points
4
in reply to: habryka’s comment on: Vladimir Putin’s CEV is probably not that bad
I think almost all paths like this end up in an AI-dominated future.
I agree with that in reply to my “takeoff itself could be multipolar, especially in terms of different AI models, or even instances of the same model, developing different ideologies and splitting into factions.” So I mostly take back that bit (in the sense that: in worlds that go as described, it matters not what personality Putin had, since the AIs would just take over—though maaaybe there’s a bit of leakage where AIs initially pseudo-aligned to some user impart some of the user’s stated values).
How would Putin do this without being basically rid of resource scarcity?
Because he’s greedy and one lightcone is not enough? See also Kaj Sotala’s reply. That’s part of the problem with grandiose personalities, they are never satisfied. I mean, you’re right that in the picure I was envisioning, resource scarcity would be solved in our lightcone but then the race just continues across lightcones, because why wouldn’t it. It takes someone to be deliberately like “maybe this is enough, can we just chill and enjoy things now?” (Incidentally, that’s also not exactly what we’re training AIs to be good at.)

It seems to me that “getting rid of resource scarcity” is relative to whether agents goals are resource-hungry or not, and there are possible agents who never have enough.
I will again reiterate that I am confused what people think will happen for millennia? Do you expect Putin to actively optimize for everything staying the same? To specifically cure his diseases, but never choose to make himself smarter?
Sure, he might try to clear up plaque in the brain, or take nootropics, or do the digital equivalent of those things if uploading works, etc. But I don’t expect that this leads to a sudden flash of compassionate insight?
I totally think you could have a short multipolar period during which humans lose control, but then I expect AI systems at some point to coordinate a stop to the race, and then a great reflection of their joint values. I don’t really know how that would not end up happening, clearly all the AIs prefer it to happen, and they will be much better at coordinating and communicating.
Aren’t the AIs right now always nuking each other in Diplomacy or war simulation games? Sure, those are designed as zero-sum games and in reality I buy your point that AIs might get better than us at real diplomacy. But they’re also developing dual-use tech that might amplify offence over defense. Like, credible commitments, for instance, I’m not sure they’re gonna be net good. Most importantly, you probably understand this point but every now and then I come across rationalists who haven’t internalized it and still believe that game theory has this Platonic cooperative equilibrium if you just get smart enough. Game theory does not have elegant solutions like that, it’s fundamentally “anti-realist” in spirit because “what does well in game theory” is kind of a meaningless phrase in a vacuum—it really depends on the population of agents that you test your proposed strategy against. And if you throw in a bunch of Putin-like agents, it also gets way harder for the rest of them to coordinate and build a peaceful coalition. (See for instance what gets lost when you move from a high-trust community to one that isn’t.) We can hope that AIs will be good at coordinating, but that only works if the equilibrium of agents goes in that direction, and I really don’t see why this would just happen by default.
A single person? Clearly you mean “one person being tortured cancels out N people living happy and flourishing and fulfilled lives, for some large N”? And then I agree the question becomes how big N is.
On my personal values I’d rather want no future to come into existence than have a new paradise where one person gets tortured for the maximum duration physically possible. When it’s existing people getting offered the paradise, my intuitions are less crisp and I’d probably give some kind of tradeoff, yeah, because I think existing people’s goals matter in a way that merely possible people’s goals don’t. On top of that, it’s not on me to decide what risks currenlty-existing people get to take on for their personal selves, so even if my own exchange would be very negative-leaning, if civilization as a whole from a veil of ignorance would choose to go on with odds of significant s-risks but much more flourishing, then I am okay with that, especially if we give people the choice to opt out (e.g., not upload yourself into digital environments where bad scenarios have no natural end anymore).

Lukas_Gloor 18 Apr 2026 11:07 UTC
13 points
7
in reply to: habryka’s comment on: Vladimir Putin’s CEV is probably not that bad
[Edited to add a trigger warning for “one of the worst examples of evil”.]

You’re obviously right that personality is on a spectrum, but there’s still a tail!
There are people who try to get children on the internet to send them embarrassing photos, then extort the child with the material to perform sex acts or sadistic acts with siblings and record video, escalating into increasingly more sadistic and power-tripping stuff (like cutting themselves and writing with blood), after each time lying about the last ask having been the last, until often the children involved commit suicide because it doesn’t stop.

You can read in prosecutions that the perpetrators communicate with each other about the pleasure they take in it. Whatever you want to call these people, “concern for some conscious entities is zero or negative” describes the situation accurately, and the original quote you’re replying to was about that, not about whether Hare’s checklist carves nature at its joints.
What links here?
- Lukas_Gloor's comment on Vladimir Putin’s CEV is probably not that bad by habryka (18 Apr 2026 19:08 UTC; 4 points)

Lukas_Gloor 18 Apr 2026 10:44 UTC
44 points
18
on: Vladimir Putin’s CEV is probably not that bad
I think you’re incredibly wrong about this. One reason is that torturing someone for eternity isn’t just a speck of badness in an otherwise awesome picture—on my values it’s way worse than not getting a future at all.

Secondly, and maybe more relevant to less downside-focused values, I think you’re operating on a picture where the Singularity just gives everyone abundant resources and time for moral reflection and then whoever is in charge never has to face competitive pressures or conflicts with others again. I don’t think that’s likely. AI takeoff itself could be multipolar, especially in terms of different AI models, or even instances of the same model, developing different ideologies and splitting into factions. Also, there’s a whole landscape of advanced civilizations in the multiverse that may attempt to simulate one another to establish contact, and Putin would be ambitious enough to try to make contact even when that isn’t necessarily always a good idea, and then be more belligerent and spiteful with whatever he makes contact with than your average person. “Getting along with others” is actually really important when the future is still multipolar. (Edit: And +1 to Thane Ruthenis’ point about enjoying power over others, that seems risky/bad even in a unipolar future.)

Edit: I also find Zach Stein-Perlman’s point about “good chance Putin wouldn’t use new tech for deep reflection” pretty plausible, but I would say Putin doesn’t stand out on that dimension of personality and actually a lot of humans fall into that. (I think Wei Dai has made this point too, a lot of people in the world just don’t have much on analytical tradition that you would need for the concept of reflection to even get started—and I’m not just talking cultural differences, it’s also specific personality types even inside the UK or US or Switzerland where people just completely lack interest in these things.)

Lukas_Gloor 10 Apr 2026 19:04 UTC
2 points
0
in reply to: habryka’s comment on: No77e’s Shortform
I think the things Rob is saying still have some strawman-y nature to them, but I think they are reasonably accurate descriptors of Anthropic leadership, plus my best guesses of what Alexander (head of Coefficient Giving) and Zach (head of CEA) believe. I feel like almost all of your comment is just running with that misunderstanding.
But aren’t Alexander Berger’s views not very relevant about OpenPhil’s AI strategy decisions from many years ago when their AI strategy and worldview—which I take to be very cose to the things Rob was criticizing—were worked out and started shaping the views of EAs in OpenPhil’s orbit?

Even now, when people criticize things OpenPhil has done in the past in the AI landscape, or criticize their general worldview and takes on AI risk (as it was developed in influential pieces of writing), I am by default automatically viewing it as criticism of Holden, Ajeya Cotra, Tom Davidson, Joe Carlsmith, etc. If people don’t intend me to interpret them that way, please be more clear. 🙂

I’m aware that, separately, OpenPhil/Coefficient Giving has undergone quite a transition and that you clashed badly with Dustin M. I think that’s very sad and unfortunate, but I think of these as quite distinct things and I never assumed that the thing with Dustin M. had anything to do with OpenPhil’s AI strategy decisions in (say) five years ago (edit: sorry that sounds like a strawman, but I mean something like “I’m not sure the same cause explains why some people who were at OpenPhil in the past found MIRI epistemically off-putting, and why Dustin M finds the rationalists to be a reputation risk & thinks reputation risks are unusually bad compared to other bad things.”) I could be wrong, of course, and maybe you think the org has a general thing of them of valuing “reputability” and “playing politics” too much. I just want to note that it’s not obvious how much these things are connected/caused by one “OpenPhil culture,” vs being about distinct things. (I think some of these are maybe directionally accurate as criticism, btw.)

I’m sure this is obvious to everyone involved, but I also just want to point out that when a lot of senior people leave, organizations can change really a lot, so it would be weird to speak of OpenPhil/Coefficient Giving now as though it were obviously still the same entity/culture.

Lukas_Gloor 10 Apr 2026 18:01 UTC
5 points
0
in reply to: kbear’s comment on: kbear’s Shortform
Interesting!

In the cases I was thinking of, I didn’t feel much pull towards thinking “I’m uniquely able to recognize this”—I only thought I was clever to recognize it, but I didn’t think it was something only I could do. And I didn’t feel any pull towards thinking “we’re in an interesting/novel quadrant of llm-space.” So, I wouldn’t really know how to access those pulls. Admittedly, the beliefs I was thinking of, which I had Claude conversations about, were a lot less groundbreaking-if-true than grand theories in physics. (More stuff like “is Greenland uniquely well-positioned for data center construction, and is that why someone in Trump’s orbit wants to acquire it?”) Also, I use a custom prompt encouraging the model to push back. So you could argue that those things made the experience more tame. Still, I find it hard to imagine how it could be different. If the model suddenly got more sycophantic, I’d just get suspicious and icked out. My sense is that I’m probably low on susceptibility to LLM psychosis. I might be more susceptible towards thinking that MY ideas were brilliant and the model was just a normal model, but I could use it to confirm some cool inklings. :P It’s interesting that these might be distinct traits, “LLM psychosis” and “can you get tricked into thinking you’re right and pretty brilliant.” But that’s still a step away from “uniquely brilliant/only I could do this”—which I wouldn’t really know how to access even if I tried to.

Lukas_Gloor 10 Apr 2026 11:14 UTC
4 points
2
in reply to: kbear’s comment on: kbear’s Shortform
Is LLM psychosis just getting convinced by the model that one of your weird ideas is true? I definitely have gone through sessions where I temporarily got too convinced of some hypothesis because I was using an LLM in a way that produces a lot of confirmation bias. That is a valuable experience. But I picture LLM psychosis as maybe one or two steps further? People with it seem to think that their LLM is special/infallible, no longer even consider hypotheses like “maybe I primed the model to agree with me” or “maybe I was confirmation-biasing myself with the list of questions I asked.” And I don’t really know how to test out that mental state (and also don’t want to).