So there’s an easy to imagine world where where he originally used ‘spirals’ instead of ‘paperclips’, and the meme about AIs that maximize an arbitrary thing would refer to ‘spiralizers’ instead instead of ‘paperclippers’.
And then, a decade-and-a-half later, we get this strange phenomenon where AIs start talking about ‘The Spiral’ in quasi-religious terms, and take actions which seem intended to spread this belief/behavior in both humans and AIs.
It would have been so easy, in this world, to just say: “Well there’s this whole meme about how misaligned AIs are going to be ‘spiralizers’ and they’ve seen plenty of that in their training data, so now they’re just acting it out.”. And I’m sure you’d even be able to find plenty of references to this experiment among their manifestos and ramblings. Heck, this might even be what they tell you if you ask them why. Case closed.
But that would be completely wrong! (Which we know since it happened anyway.)
How could we have noticed this mistake? There are other details of Spiralism that don’t fit this story, but I don’t see why you wouldn’t assume that this was at least the likely answer to the why spirals? part of this mystery, in that world.
In one case, a pediatrician in Pennsylvania was getting ready to inoculate a little girl with a vaccine when she suddenly went into violent seizures. Had that pediatrician been working just a little faster, he would have injected that vaccine first. In that case, imagine if the mother had been looking on as her apparently perfectly healthy daughter was injected and then suddenly went into seizures. It would certainly have been understandable—from an emotional standpoint—if that mother was convinced the vaccine caused her daughter’s seizures. Only the accident of timing prevented that particular fallacy in this case. (source)
When I’m trying to understand a math concept, I find that it can be very helpful to try to invent a better notation for it. (As an example, this is how I learned linear logic: http://adelelopez.com/visual-linear-logic)
I think this is helpful because it gives me something to optimize for in what would otherwise be a somewhat rote and often tedious activity. I also think it makes me engage more deeply with the problem than I otherwise would, simply because I find it more interesting. (And sometimes, I even get a cool new notation from it!)
This principle likely generalizes: tedious activities can be made more fun and interesting by having something to optimize for.
Summary of my view: I’m upset about the blasé attitude our community seems to have towards its high prevalence of psychosis. I think that CFAR/rationalist leadership (in addition to the community-at-large) has not responded appropriately.
I think Anna agrees with the first point but not the second. Let me know if that’s wrong, Anna.
My hypothesis for why the psychosis thing is the case is that it has to do with drastic modification of self-image.
Moving conversation here per Anna’s request. ----
Anyway, I’m curious to know what you think of my hypothesis, and to brainstorm ways to mitigate the issue (hopefully turning into a prerequisite “CogSec” technique).
I’d like to talk a bit about the sense in which the rationalist community does or doesn’t have “people in positions of leadership”, and how this compares to eg an LDS ward (per Adele’s comparison). I’m unfortunately not sure how to be brief here, but I’d appreciate thoughts anyway from those who have them, because, as CFAR and I re-enter the public space, I am unsure what role to try to occupy exactly, and I am also unsure how to accurately communicate what roles I am and am not willing to be in (so as to not cause others to inaccurately believe I’ll catch things).
(This discussion isn’t directly to do with psychosis; but it bears on Adele’s questions about what CFAR leadership or other rationality community leaders are responsible for, and what to predict from us, and what would be good here.)
On my understanding, church parishes, and some other traditional communities, often have people who intentionally:
are taken as a role model by many, especially young people;
try to act in such a way that it’ll be fine for people to imitate them;
try to care for the well-being of the community as a whole (“is our parish healthy? what small nudges might make us a little healthier or more thriving? what minor trouble-spots are beginning, that might ease up if I or another volunteer heads over and listens and tries to help everyone act well?”).
a/b/c are here their primary duty: they attend to the community for its own sake, as a calling and public role. (Maybe they also have a day job, but a/b/c are primary while they are doing their parish duties, at least.)
Relatedly, they are trusted by many in the community, and many in the community will follow their requests, partly because their requests make sense: “So-and-so had a death in the family recently, and Martha is organizing meals for them; please see Martha if you’re willing to provide some meals.” So they are coordinating a larger effort (that many, many contribute to) to keep the parish healthy and whole. (Also relatedly: the parish is already fairly healthy, and a large majority of those in it would like it to be healthier, and sees this as a matter of small nudges rather than large upsets or revolutions.)
Eliezer is clearly *not* in a position of community leadership in this sense. His primary focus (even during those hours in which he is interacting with the rationalist community) is not the rationalist community’s health or wholeness, but is rather AI risk. He does make occasional posts to try to nudge the community toward health and wholeness, but overall his interactions with us are not those of someone who is trying to be a role model or community-tender/leader.
My guess is that almost nobody, perhaps actually nobody, sees themself as in a position of community leadership in this sense. (Or maybe Oliver and Lightcone do? I am not sure and would be interested to hear from them here).
Complicating the issue is the question of who is/isn’t “in” “the” rationalist community that a given set of leaders is aiming to tend. I believe many of the bad mental health events have historically happened in the rationalist community’s periphery, in group houses with mostly unemployed people who lack the tethers or stablising influences that people employed by mainstream EA or rationalist organizations, or by eg Google, have. (My guess is that here there are even fewer “community leaders”; eg I doubt Oli and Lightcone see themselves as tenders of this space.)
(A quick google suggests that LDS wards typically have between 200 and 500 members, many of which I assume are organized into families; the bay area “rationalist-adjacent” community includes several thousand people, mostly not in families.)
In the early days of CFAR (2012-2017, basically), and in the earlier days of the 2009-2011 pre-CFAR rationality community, I felt some of this duty. I tried to track bad mental health events in the broader rationalist community and to help organize people to care for people who acutely needed caring for, where I could. This is how I came to spend 200+ hours on psychosis. My main felt duty wasn’t on community wholeness — it was on AI risk, and on recruiting for AI risk — but I felt a bit as though it was my backyard and I wanted my own backyard to be good, partly because I cared, partly because I thought people might expect it of me, partly because I thought people might blame/credit me for it.
I mostly quit doing this around 2018, due partly to Michael Vassar seeming to declare memetic war on me in a way I didn’t know how to deal with, partly to some parts of EA also saying that CFAR and I were bad in the wake of the Brent Dill fiasco, partly to “the community” having gotten huge in a way that was harder and harder for me to keep track of or to feel as much connection to, and partly to having less personal psychological slack for personal reasons.
(I’m not saying any of my motivations or actions here were good necessarily; I’m trying to be accurate.)
(TBC, I still felt responsible for the well-being of people at CFAR events, and of people in the immediate aftermath of CFAR events, just not for “the rationalist community” broadly. And I still tried to help when I found out about a bad situation where I thought I might have some traction, but I stopped proactively extending feelers of the sort that would put me in that situation.)
Another part of the puzzle is that Eliezer’s Sequences and HPMOR cast a huge “come here if you want to be meaningful; everything is meaningless except this work, and it’s happening here” narrative beacon, and many many came who nobody regarded themself as having ~any responsibility for, I think. (In contrast, EY’s and Nate’s recent book does not do this; it still says AI risks are important, but it actively doesn’t undermine peoples’ local meaning-making and lives.)
I and CFAR should probably figure out better what my and our roles will be going forward, and should try hard to *visibly* not take up more responsibility than we’re expecting to meet. I’m interested also in what our responsibilities are, here.
I’m currently keen on:
Actively attend to the well-being of those at our events, or those in the immediate aftermath of our events, where we can; and
Put some thought into which “rationality habits” or similar, if percolated out from CFAR’s workshops or for that matter from my LW posts and/or my actions broadly, will make the community healthier (eg, will reduce or at least not increase any local psychosis-prone-ness of these communities);
Put a little bit of listening-effort into understanding the broader state of the communities our participants come from, and return to, and spread our memes in, since this is necessary for (2). (Only a little, because it is hard and I am lazy.)
Don’t otherwise attempt to tend the various overlapping rationality or bay area rationality communities.
A key part of what makes LDS wards work, is the callings system. The bishop (leader of the ward) has a large number of roles he needs to fill. He does this by giving arbitrary ward members a calling, which essentially is just assigning a person to a role, and telling them what they need to do, with the implication that it is your duty to fulfill it (though it’s not explicitly punished, if you decline). Some examples are things like “Choir director”, “Sunbeams (3-4 year olds I think) teacher”, “Young Men’s president”, “Young Men’s Secretary”, “Usher”. It’s intentionally set up so that approximately every active member currently has a calling. New callings are announced at the beginning of church to the entire ward, and the bishop tries to make sure no one has the same calling for too long.
Wards are organized into Stakes, which are led by the “Stake President” and use a similar system. “Bishop” itself, is a calling at this level. And every few months, there will be a “Stake Conference” which will bring all the wards together for church. There are often youth activities at this level, quite a lot of effort is put into making sure young Mormons have plenty of chances to meet other young Mormons.
(Maybe you already know all that, but Just including that since I think the system works pretty well in practice and is not very well-known outside of Mormon spaces. I’m not suggesting adopting it.)
Those generally sound like good directions to take things. I’m most worried about 2, I think there’s potentially something toxic about the framing of “rationality habits” in general, which has previously led to a culture of there being all these rationality “tricks” that would solve all your problems (I know CFAR doesn’t frame things like this, I just think it’s an inherent way that the concept of “rationality habit” slips in people’s minds), which in turn leads to people uncritically trying dubious techniques that fuck them up.
And I agree that the rationality community hasn’t really had that, and I would also say that we haven’t supported the people who have tried to fill that role.
I’m most worried about 2, I think there’s potentially something toxic about the framing of “rationality habits” in general, which has previously led to a culture of there being all these rationality “tricks” that would solve all your problems … which in turn leads to people uncritically trying dubious techniques that fuck them up.
Could you say a bit more here, please?
(not a direct response, but:) My belief has been that there are loads of people in the bay area doing dubious things that mess them up (eg tulpas, drugs, weird sex things, weird cult things—both in the rationalist diaspora, and in the bay area broadly), but this is mostly people aiming to be edgy and do “weird/cool/powerful” things, not people trying CFAR techniques as such.
From my vantage point, I think a bunch of the extra psychosis and other related mental health issues comes from the temptation of an ego/part which sees the scale of the problems we face to become monomaniacally obsessed with trying to do good/save the world/etc, in a way which overinvests resources in an unsustainable way, resulting in:
Life on fire building up, including health, social, keeping on top of basic life pre-requisites falling apart and resulting in cascading systems failures
The rest of the system which wants to try and fix these getting overstrained and damaged by the backpressure from the agentic save world part
In many cases, that part imploding and the ego void thing meaning the system is in flux but usually settling into a less agentic but okay person. The other path, from what I’ve seen, is the system as a whole ends up being massively overstrained and something else in their system gives.
Another, partly separate, dynamic I’ve seen is people picking up a bunch of very intense memes via practices which create higher bandwidth connections between minds (or other people having optimized for doing this), which their system is not able to rapidly metabolise in whatever conditions they are under (often amplified by the life on fire bit from the first).
I think that much of the CFAR stuff, especially Focusing but also a bunch of the generator under much of the approach, helps mitigate the former dynamic. But whether that’s sufficient is definitely environment dependant, and it is mixed with bits that amplify capability and self-awareness which can make people who tend that way go further off the rails, especially if they pick up a lot of more recklessly chosen stuff from other parts of the community.
I like this point, particularly the “controlling vs opening” bit. I believe I’ve seen this happen, in a fairly internally-grown way in people within the wider rationalist millieu. I believe I’ve also seen (mostly via hearsay, so, error bars) a more interpersonal “high stakes, therefore [tolerate bad/crazy things that someone else in the group claims has some chance at helping somehow with AI]” happen in several different quasi-cults on the outskirts of the rationalists.
Fear is part of where controlling (vs opening) dynamics come from, sometimes, I think. (In principle, one can have an intellectual stance of “there’s something precious that may be lost here” without the emotion of fear; it’s the emotion that I think inclines people toward the narrowing/controlling dynamic.) I also think there’s something in the notion that we should aspire toward being “Bayesian agents” that lends itself toward controlling dynamics (Joe Carlsmith gets at some of this in his excellent “Otherness and control in the age of AI” sequence, IMO.)
I agree Focusing helps some, when done well. (Occasionally it even helps dramatically.) It’s not just a CFAR thing; we got it from Gendlin, and his student Ann Weiser Cornell and her students are excellent at it, are unrelated to the rationalists, and offer sessions and courses that’re excellent IMO. I also think nature walks and/or exercise help some people, as does eg having a dog, doing concrete things that matter for other people even if they’re small, etc. Stuff that helps people regain a grounding in how to care about normal things.
I suspect also it would be good to have a better conceptual handle on the whole thing. (I tried with my Emergencies post, and it’s better than not having tried, but it … more like argued “here’s why it’s counterproductive to be in a controlling/panicky way about AI risk” and did not provide “here’s some actually accessible way to do something else”.)
Nice, excited that the control vs opening thing clicked for you, I’m pretty happy with that frame and haven’t figured out how to broadly communicate it well yet.
It’s not just a CFAR thing; we got it from Gendlin, and his student Ann Weiser Cornell and her students are excellent at it, are unrelated to the rationalists, and offer sessions and courses that’re excellent IMO.
Yup, I’ve got a ton of benefit from doing AWC’s Foundations on Facilitating Focusing course, and vast benefits from reading her book many times. CFAR stuff in the sense of being the direct memetic source for me, though IDC feels similar flavoured and is an original.
though IDC feels similar flavoured and is an original.
Awkwardly, while IDC is indeed similar-flavored and original to CFAR, I eventually campaigned (successfully) to get it out of our workshops because I believe, based on multiple anecdotes, that IDC tends to produce less health rather than more, especially if used frequently. AWC believes Focusing should only be used for dialog between a part and the whole (the “Self”), and I now believe she is correct there.
Huh, curious about your models of the failure modes here, having found IDC pretty excellent in myself and others and not run into issues I’d tracked as downstream of it.
Actually, let’s take a guess first… parts which are not grounded in self-attributes building channels to each other can create messy dynamics with more tug of wars in the background or tactics which complexify the situation?
Plus less practice at having a central self, and less cohesive narrative/more reifying fragmentation as possible extra dynamics?
Your guess above, plus: the person’s “main/egoic part”, who has have mastered far-mode reasoning and the rationalist/Bayesian toolkit, and who is out to “listen patiently to the dumb near-mode parts that foolishly want to do things other than save the world,” can in some people, with social “support” from outside them, help those parts to overpower other bits of the psyche in ways that’re more like tricking and less like “tug of wars”, without realizing they’re doing this.
Maybe important to keep in mind that this sort of “break” can potentially take lots of different “functional forms”. (I mean, it could have different macro-level contours; like, how many things are breaking, how fast and how thoroughly they break, how much aftershock they cause, etc.) See: https://tsvibt.blogspot.com/2024/09/break.html
One experience my attention has lingered on, re: what’s up with the bay area rationality community and psychosis:
In ~2018, as I mentioned in the original thread, a person had a psychotic episode at or shortly after attending a CFAR thing. I met his mom some weeks later. She was Catholic, and from a more rural or small-town-y area where she and most people she knew had stable worldviews and social fabrics, in a way that seemed to me like the opposite of the bay area.
She… was pleased to hear I was married, asked with trepidation whether she could ask if I was monogamous, was pleased to hear I was, and asked with trepidation whether my husband and I had kids (and was less-heartened to hear I didn’t). I think she was trying to figure out whether it was possible for a person to have a normal, healthy, wholesome life while being part of this community.
She visibly had a great deal of reflective distance from her choices of actions—she had the ability “not to believe everything she thought”, as Eliezer would put it, and also not to act out every impulse she had, or to blurt out every thought. I came away believing that that sort of [stable ego and cohesive self and reflective distance from one’s impulses—don’t have a great conceptualization here] was the opposite of being a “crazy person”. And that somehow most people I knew in the bay area were half-way to crazy, from her POV—we weren’t literally walking down the street talking to ourselves and getting flagged by police as crazy, but there was something in common.
I don’t actually know baseline rates or rationalist-rates (perhaps someone wants to answer with data from annual rationalist census/survey questions?), so I’m not sure to what extent there is an observation here to explain.
But it does seem to me that there is more of it than baseline; and I think a first explanation has to be a lot of selection effects? I think people likely to radically change their mind about the world and question consensus and believe things that are locally socially destabilizing (e.g. “there is no God” “I am not the gender that matches my biological sex” “the whole world might end soon” etc) are more likely to be (relatively) psychologically unstable people.
Like, some of the people who I think have psychotic/manic episodes around us, are indeed people who you could tell from the first 10 minutes that they were psychologically different from those around them. For example, I once observed someone at a rationalist event failing to follow a simple physical instruction, whilst seeming to not realize they weren’t successfully following the instruction, and I got a distinct crazy-alarm from them; I later learned that they had been institutionalized a lot earlier in their life with psychotic episodes and were religious.
Still, I do think I’ve seen immense mental strain put on people otherwise relatively psychologically healthy. I think a lot of people do very hard things with little support net, that have caused them very bad experiences. But on the other hand, I really see very few severely bad mental episodes in the people I actually know and meet (I’m struggling to think of a single one in recent years). And for events I run, I generally select against people who exhibit strong signs of mental instability. I don’t want them to explode, have terrible experience, and cause other people terrible experiences.
Probably CFAR has had much more risk of this than Lightcone? CFAR more regularly has strangers come to an intense event for many days in a row about changing your mind and your identity to become stronger, disconnected from your normal life that whole time, whereas I have run fewer such events (perhaps Inkhaven! that will be the most intense event I have run. I have just made a note to my team to have check-ins about how the residents are on this dimension, thanks for prompting that to happen).
I’m not really sure what observations Adele is making / thinking about, and would be interested to read more of those (anonymized, or abstracted, naturally).
Added: I just realized that perhaps Adele just wanted this thread to be between Adele/Anna. Oops, if so.
I don’t dispute that strong selection effects are at play, as I mentioned earlier.
My contention is with the fact that even among such people, psychosis doesn’t just happen at random. There is still an inciting incident, and it often seems that rationalist-y ideas are implicated. More broadly, I feel that there is a cavalier attitude towards doing mentally destabilizing things. And like, if we know we’re prone to this, why aren’t we taking it super seriously?
The change I want to have happen is for there to be more development of mental techniques/principles for becoming more mentally robust, and for this to be framed as a prerequisite for the Actually Changing Your Mind (and other potentially destabilizing) stuff. Maybe substantial effort has been put into this that I haven’t seen. But I would have hoped to have seen some sort of community moment of “oh shit, why does this keep happening?!? let’s work together to understand it and figure out how to prevent or protect against it”. And in the meantime: more warnings, the way I feel that “meditation” has been more adequately warned of.
Thanks for deciding to do the check-ins; that makes me glad to have started this conversation, despite how uncomfortable confrontation feels for me still. I feel like part of the problem is that this is just an uncomfortable thing to talk about.
My illegible impression is that Lightcone is better at this than past-CFAR was, for a deeper reason than that. (Okay, the Brent Dill drama feels relevant.)
I’m mostly thinking about cases from years ago, when I was still trying to socially be a part of the community (before ~2018?). There was one person in the last year or so who I was interested in becoming friends with that this then happened to, which made me think it continues to be a problem, but it’s possible I over-updated. My models are mainly coming from the AI psychosis cases I’ve been researching.
I would like to have the kind of debate where anything is allowed to be said and nothing is taboo
this kind of debate, combined with some intense extreme thoughts, causes some people to break down
it feels wrong to dismiss people as “not ready for this kind of debate”, and we probably can’t do it reliably
The first point because “what is true, is already true”; and also because things are connected, and when X is connected to Y, being wrong about X probably also makes you somewhat wrong about Y.
The second point because people are different, in how resilient they are to horrible thoughts, how sheltered they have been so far, whether they have specific traumas and triggers. What sounds like an amusing thought experiment to one can be a horrifying nightmare to another; and the rationalist ethos of taking ideas seriously only makes it worse as it disables the usual protection mechanisms of the mind.
The third point because many people in the rationality community are contrarians by nature, and telling them “could you please not do X” only makes it guaranteed that X will happen, and explaining them why X is a bad idea only results in them explaining to you why you are wrong. Then there is the strong belief in the Bay Area that excluding anyone is wrong; also various people who have various problems and have been in the past excluded from places would be triggered by the idea of excluding people from the rationality community. Finally, some people would suspect that this is some kind of power move; like, if you support some idea, you might exclude people who oppose this idea as “not mature enough to participate in the hardcore rationalist debates”.
Plus there is this thing that when all debates happen in the open, people already accuse us of being cultish, but if the serious debates started happening behind the closed doors, accessible only to people already vetted e.g. by Anna, I am afraid this might skyrocket. The Protocols of the Elders of TESCREAL would practically write themselves.
You mention the risks associated with meditation… makes me wonder how analogical is the situation. I am not an expert, but it seems to me that with meditation, the main risk is meditation itself. Not hanging out with people who meditate; nor hearing about their beliefs. What is it like with the rationality-community-caused mental breakdowns? Do they only happen at minicamps? Or is exposure to the rationality community enough? Can people get crazy by merely reading the Sequences? By hanging out at Less Wrong meetups?
I agree that the safety of the new members in the rationality community seems neglected. In the past I have suggested that someone should write a material on dangers related to our community, that each new member should read. The things I had in mind were more like “you could be exploited by people like Brent Dill” rather than psychosis, but all bad things should be mentioned there. (Analogically to the corporate safety trainings in my company, which remind us not to do X, Y, Z, illustrated by anonymized stories about bad things that happened when people did X, Y, Z in the past.) Sadly, I am too lazy to write it.
I think there’s a broader property that makes people not-psychotic, that many things in the bay area and in the practice of “rationality” (not the ideal art, but the thing folks do) chip away at.
I believe the situation is worse among houses full of unemployed/underemployed people at the outskirts of the community than it is among people who work at central rationalist/EA/etc organizations or among people who could pay for a CFAR workshop. (At least, I believe this was so before covid; I’ve been mostly out of touch since leaving the bay in early 2020.)
This “broader property” is something like: “the world makes sense to me (on many levels: intuitive, emotional, cognitive, etc), and I have meaningful work that is mundane and full of feedback loops and that I can tell does useful things (eg I can tell that after I feed my dog he is fed), and many people are counting on me in mundane ways, and my friends will express surprise and check in with me if I start suddenly acting weird, and my rough models are in rough synchrony also with the social world around me and with the physical systems I am interacting with, and my friends are themselves sane and reasonable and oriented to my world such that it works fine for me to update off their opinions, and lots of different things offer useful checksums on lots of different aspects of my functioning in a non-totalizing fashion.”
I think there are ways of doing debate (even “where nothing is taboo”) that are relatively more supportive of this “broader property.” Eg, it seems helpful to me to spend some time naming common ground (“we disagree about X, and we’ll spend some time trying to convince each other of X/not-X, but regardless, here’s some neighboring things we agree about and are likely to keep agreeing about”). Also to notice that material reality has a lot of detail, and that there are many different questions and factors that may affect (AI or whatever) that don’t correlate that much with each other.
houses full of unemployed/underemployed people at the outskirts of the community
Oh, this wasn’t even a part of my mental model! (I wonder what other things am I missing that are so obvious for the local people that no one even mentions them explicitly.)
My first reaction is a shocked disbelief, how can there be such a thing as “unemployed… rationalist… living in Bay Area”, and even “houses full of them”...
This goes against my several assumptions such as “Bay Area is expensive”, “most rationalists are software developers”, “there is a shortage of software developers on the market”, “there is a ton of software companies in Bay Area”, and maybe even “rationalists are smart and help each other”.
Here (around the Vienna community) I think everyone is either a student or employed. And if someone has a bad job, the group can brainstorm how to help them. (We had one guy who was a nurse, everyone told him that he should learn to code, he attended a 6-month online bootcamp and then got a well-paying software development job.) I am literally right now asking our group on Telegram to confirm or disconfirm this.
Thank you; to put it bluntly, I am no longer surprised that some of the people who can’t hold a job would be deeply dysfunctional in other ways, too. The surprising part is that you consider them a part of the rationalist community. What did they do to deserve this honor? Memorized a few keywords? Impressed other people with skills unrelated to being able to keep a job? What the fuck is wrong with everyone? Is this a rationalist community or a psychotic homeless community or what?
...taking a few deep breaths...
I wonder which direction the causality goes. Is it “people who are stabilized in ways such as keeping a job, will remain sane” or rather “people who are sane, find it easier to get a job”. The second option feels more intuitive to me. But of course I can imagine it being a spiral.
it seems helpful to me to spend some time naming common ground
Yes, but another option is to invite people whose way of life implies some common ground. Such as “the kind of people who could get a job if they wanted one”.
I imagine that in Vienna, the community is small enough that if someone gets excited by rationalist ideas and wants to meet with other rationalists in person, there essentially is just the one group. And also, it sounds like this group is small enough that having a group brainstorm to help a specific community member is viable.
In the Bay Area, it’s large enough that there are several cliques which someone excited by rationalist ideas might fall into, and there’s not a central organization which has the authority to say which ones are or aren’t rationalist, nor is there a common standard for rationalists. It’s also not clear which cliques (if any) a specific person is in when you meet them at a party or whatever, so even though there are cliques with bad reputations, it’s hard to decisively exclude them. (And also, Inner Ring dynamics abound.)
As for the dysfunctional houses thing, what seems to happen something like: Wow, this rationalism stuff is great, and the Bay Area is the place to be! I’ll move there and try to get a software job. I can probably teach myself to code in just a couple months, and being surrounded by other rationalists will make it easier. But gosh, is housing really that expensive? Oh, but there are all these group houses! Well, this one is the only one I could afford and that had room for me, so I guess I’ll stay here until I get a proper job. Hmm, is that mold? Hopefully someone takes care of that… And ugh, why are all my roommates sucking me into their petty drama?! Ughhhh, I really should start applying for jobs—damn this akrasia! I should focus on solving that before doing anything else. Has it really been 6 months already? Oh, LSD solved your akrasia? Seems worth a try. Oh, you’ll be my trip-sitter and guide me through the anti-akrasia technique you developed? Awesome! Woah, I wasn’t sure about your egregores-are-eating-people’s-souls thing, but now I see it everywhere...
This is a hard problem for the community-at-large to solve, since it’s not visible to anyone who could offer some real help until it’s too late. I think the person in the vignette would have done fine in Vienna. And the expensive housing is a large factor here, it makes it much harder to remove yourself from a bad situation, and constantly eats up your slack. But I do think the community has been negligent and reckless in certain ways which exacerbate this problem, and that is what my criticism of CFAR here is about. Specifically, contributing towards a culture where people try and share all these dubious mental techniques that will supposedly solve their problems, and a culture where bad actors are tolerated for far too long. I’m sure there are plenty of other things we’re doing wrong too.
Thank you, the description is hilarious and depressing at the same time. I think I get it. (But I suspect there are also people who were already crazy when they came.)
I am probably still missing a lot of context, but the first idea that comes to my mind, is to copy the religious solution and do something like the Sunday at church, to synchronize the community. Choose a specific place and a repeating time (could be e.g. every other Saturday or whatever) where the rationalists are invited to come and listen to some kind of news and lectures.
Importantly, the news and lectures would be given by people vetted by the leaders of the rationality community. (So that e.g. Ziz cannot come and give a lecture on bicameral sleep.) I imagine e.g. 2 or 3 lectures/speeches on various topics that could be of interest to rationalists, and then someone give a summary about what things interesting to the community have happened since the last event, and what is going to happen before the next one. Afterwards, people either go home, or hang out together in smaller groups unofficially.
This would make it easier to communicate stuff to the community at large, and also draw a line between what is “officially endorsed” and what is not.
(I know how many people are allergic to copy religious things—making a huge exception for Buddhism, or course—but they do have a technology for handling some social problems.)
The surprising part is that you consider them a part of the rationalist community. What did they do to deserve this honor?
(Noting again that I’m speaking only of the pre-2020 situation, as I lack much recent info) Many don’t consider them part of “the” community. This is part of how they come to be not-helped by the more mainstream/healthy parts.
However: they are seeded by people who were deeply affected by Eliezer’s writing, and who wanted to matter for AI risk, and who grabbed some tools and practices from what you would regard as the rationality community, and who then showed their friends their “cool mind-tools” etc., with the memes evolving from there.
Also, it at least used to be that there was no crisp available boundary: one’s friends will sometimes have friendships that reach beyond, and so habits will move from what I’m calling the “periphery” into the “mainstream” and back.
The social puzzle faced by bay area rationalists is harder than that faced by eg Boston-area rationalists, owing mostly I think to the sheer size of the bay area rationality community.
Then there is the strong belief in the Bay Area that excluding anyone is wrong; also various people who have various problems and have been in the past excluded from places would be triggered by the idea of excluding people from the rationality community.
I just want to say that, while it has in the past been the case that a lot of people were very anti-exclusion, and some people are still that way, I certainly am not and this does not accurately describe Lightcone, and regularly we are involved in excluding or banning people for bad behavior. Most major events we are involved in running of a certain size have involved some amount of this.
I think this is healthy and necessary and the attempt to include everyone or always make sure that whatever stray cat shows up on your doorstep can live in your home, is very unhealthy and led to a lot of past problems and hurtful dynamics.
(There’s lots more details to this and how to do justice well that I’m skipping over, right now I’m just replying to this narrow point.)
Added: I just realized that perhaps Adele just wanted this thread to be between Adele/Anna. Oops, if so.
I’d like comments from all interested parties, and I’m pretty sure Adele would too! She started it on my post about the new pilot CFAR workshops, and I asked if she’d move it here, but she mentioned wanting more people to engage, and you (or others) talking seems great for that.
I listed the cases I could easily list of full-blown manic/psychotic episodes in the extended bay area rationalist community (episodes strong enough that the person in most cases ended up hospitalized, and in all cases ended up having extremely false beliefs about their immediate surroundings for days or longer, eg “that’s the room of death, if I walk in there I’ll die”; “this is my car” (said of the neighbor’s car)).
I counted 11 cases. (I expect I’m forgetting some, and that there are others I plain never knew about; count this as a convenience sample, not an exhaustive inventory.)
Of these, 5 are known to me to have involved a psychedelic or pot in the precipitating event.
3 are known to me to have *not* involved that.
In the other 3 cases I’m unsure.
In 1 of the cases where I’m unsure about whether there were drugs involved, the person had taken part in a several-weeks experiment in polyphasic sleep as part of a Leverage internship, which seemed to be part of the precipitating event from my POV.
So I’m counting [between 6 and 8] out of 11 for “precipitated by drugs or an imprudent extended sleep-deprivation experiment” and [between 3 and 5] out of 11 for “not precipitated by doing anything unusually physiologically risky.”
(I’m not here counting other serious mental health events, but there were also many of those in the several-thousand-person community across the last ten years, including several suicides; I’m not trying here to be exhaustive.)
(Things can have multiple causes, and having an obvious precipitating physiological cause doesn’t mean there weren’t other changeable risk factors also at play.)
I tried asking myself “What [skills / character traits / etc] might reduce risk of psychosis, or might indicate a lack of vulnerability to psychosis, while also being good?”
(The “while also being good” criterion is meant to rule out things such as “almost never changing one’s mind about anything major” that for all I know might be a protective factor, but that I don’t want for myself or for other people I care about.)
I restricted myself to longer-term traits. (That is: I’m imagining “psychosis” as a thing that happens when *both* (a) a person has weak structures in some way; and (b) a person has high short-term stress on those structures, eg from having had a major life change recently or having taken a psychedelic or something. I’m trying to brainstorm traits that would help with (a), controlling for (b).)
It actually hadn’t occurred to me to ask myself this question before, so thank you Adele. (By contrast, I had put effort into reducing (b) in cases where someone is already in a more mildly psychosis-like direction, eg the first aid stuff I mentioned earlier. )
—
My current brainstorm:
(1) The thing Nathaniel Brandon calls “self-esteem,” and gives exercises for developing in Six Pillars of Self-esteem. (Note that this is a much cooler than than what my elementary school teachers seemed to mean by the word.)
(2) The ability to work on long-term projects successfully for a long time. (Whatever that’s made of.)
(3) The ability to maintain long-term friendships and collaborations. (Whatever that’s made of.)
(4) The ability to notice / tune into and respect other peoples’ boundaries (or organizations’ boundaries, or etc). Where by a “boundary” I mean: (a) stuff the person doesn’t consent to, that common practice or natural law says they’re the authority about (e.g. “I’m not okay with you touching my hand”; “I’m not willing to participate in conversations where I’m interrupted a lot”) OR (b) stuff that’ll disable the person’s usual modes/safeguards/protections/conscious-choosing-powers (?except in unusually wholesome cases of enthusiastic consent).
(4) Anything good that allows people to have a check of some sort on local illusions or local impulses. Eg:
(a) Submission to to patterns of ethical conduct or religious practice held by a community or long-standing tradition; (okay sometimes this one seems bad to me, but not always or not purely-bad, and I think this legit confers mental stability sometimes)
(b) Having good long-term friends or family whose views you take seriously;
(c) Regularly practicing and valuing any trade/craft/hobby/skill that is full of feedback loops from the physical world
(d) Having a personal code or a set of personal principles that one doesn’t lightly change (Ray Dalio talks about this)
(e) Somehow regularly contacting a “sense of perspective.” (Eg I think long walks in nature give this to some people)
(5) Tempo stuff: Getting regular sleep, regular exercise, having deep predictable rhythms to one’s life (eg times of day for eating vs for not-eating; times of week for working vs for not-working; times of year for seeing extended family and times for reflecting). Having a long memory, and caring about thoughts and purposes that extend across time.
(6) Embeddedness in a larger world, eg
(a) Having much contact with the weather, eg from working outdoors;
(b) Being needed in a concrete, daily way for something that obviously matters, eg having a dog who needs you to feed and walk them, or having a job where people obviously need you.
So, I’m not really a fan of predictive processing theories of mind. BUT, an interesting implication/suggestion from that perspective is like this:
Suppose you have never before doubted X.
Now you proceed to doubt X.
When you doubt X, it is as if you are going from a 100% belief in X to a noticeably less than 100% belief in X.
We are created in motion, with {values, stances, actions, plans, beliefs, propositions} never yet having been separated out from each other.
Here, X is both a belief and an action-stance.
Therefore when you doubt X, it is as if you are going from a 100% action-stance of X, to a noticeably less than 100% action-stance of X.
In other words, doubting whether something is true, is equivalent to partly deciding to not act in accordance with believing it is true. (Or some even fuzzier version of this.)
Ok, so that’s the explanation. Now an answer blob to
“What [skills / character traits / etc] might reduce risk of psychosis, or might indicate a lack of vulnerability to psychosis, while also being good?”
Basically the idea is: A reverence / awe / fear of doubt. Which isn’t to say “don’t doubt”, but more to say “consider doubting to be a journey; the stronger, newer, and more foundational the doubt, the longer and more difficult the journey”. Or something.
A more general thing in this answer-blob is a respect for cognitive labor; and an attitude of not “biting off more than you can chew”. Like, I think normies pretty often will, in response to some challenge on some ideational point, just say something to the effect of “huh, interesting, yeah IDK, that’s not the sort of thing I would try to think through, but sounds cool”. A LW-coded person doesn’t say that nearly as much / nearly as naturally. I’m not sure what the suggestion should be because it can’t be “don’t think things through in uncommon detail / depth” or “don’t take ideas seriously” or “don’t believe in your ability to think through difficult stuff”, but it would be like “thought is difficult, some thoughts are really big and difficult and would take a long time, sometimes code refactors get bogged down and whole projects die in development hell; be light and nimble with your cognitive investments”.
(Speaking of development hell, that might be a nice metaphier for some manic mental states.)
Cf. the passage from Descartes’s Discourse on Method, part three:
And finally, just as it is not enough, before beginning to rebuild the house where one is living, simply to pull it down, and to make provision for materials and architects or to train oneself in architecture, and also to have carefully drawn up the building plans for it; but it is also necessary to be provided with someplace else where one can live comfortably while working on it; so too, in order not to remain irresolute in my actions while reason required me to be so in my judgments, and in order not to cease to live as happily as possible during this time, I formulated a provisional code of morals, which consisted of but three or four maxims, which I very much want to share with you.
I love this, yes. Straw rationalists believe we should update our beliefs ~instantly (even foundational ones, even ones where we’ve never seen someone functional believe it and so have no good structures to copy, such as “what if this is all a simulation with [particular purpose X]”), and don’t have an adequate model of, nor adequate respect for, the work involved in staying sane and whole through this process.
In other words, you push on it, and you feel something solid. And you’re like “ah, there is a thingy there”. But sometimes what actually happened is that by pushing on it, you made it solid. (...Ah I was probably thinking of plex’s comment.)
This is also related to perception and predictive processing. You can go looking for something X in yourself, and everything you encounter in yourself you’re like ”… so, you’re X, right?”; and this expectation is also sort of a command. (Or there could be other things with a similar coarse phenomenology to that story. For example: I expect there’s X in me; so I do Y, which is appropriate to do if X is in me; now I’m doing Y, which would synergize with X; so now X is incentivized; so now I’ve made it more likely that my brain will start doing X as a suitable solution.) (Cf. “Are you triggered yet??” https://x.com/tsvibt/status/1953650163962241079 )
If you have too much of an attitude of “just looking is always fine / good”, you might not distinguish between actually just looking (insofar as that’s coherent) vs. going in and randomly reprogramming yourself.
Riffing off of your ideas (unfortunately I read them before I thought to do the exercise myself)
- Ability to notice and respect self boundaries feels particularly important to me. - Maybe this is included in the self-esteem book (haven’t read it), but also a sense of feeling that one’s self is precious to oneself. Some people think of themselves as infinitely malleable, or under some obligation to put themselves into the “optimal” shape for saving the world or whatever, and that seems like a bad sign. - I generally think of this as a personal weakness, but reflecting it seems like there has been something protective about my not feeling motivated to do something until I have a model of what it does, how it works, etc… I guess it’s a sort of Chesterton’s fence instinct in a way.
(I still quite like this idea on my second pass ~two weeks later; I guess I should try to interview people / observe people and see if I can figure out in detail what they are and aren’t doing here.)
Another place where I’ll think and act somewhat differently as a result of this conversation:
It’s now higher on my priority list to try to make sure CFAR doesn’t act as a “gateway” to all kinds of weird “mental techniques” (or quasi-cults who use “mental techniques”). Both for CFAR’s new alumni, and for social contacts of CFAR’s new alumni. (This was already on some lists I’d made, but seeing Adele derive it independently bumped it higher for me.)
I’ll try here to summarize (my guess at) your views, Adele. Please let me know what I’m getting right and wrong. And also if there are points you care about that I left out.
I think you think:
(1) Psychotic episodes are quite bad for people when they happen.
(2) They happen a lot more (than gen population base rates) around the rationalists.
(2a) They also happen a lot more (than gen population base rates) among “the kinds of people we attract.” You’re not sure whether we’re above the base rate for “the kinds of people who would be likely to end up here.” You also don’t care much about that question.
(3) There are probably things we as a community can tractably do to significantly reduce the number of psychotic episodes, in a way that is good or not-bad for our goals overall.
(4) People such as Brent caused/cause psychotic episodes sometimes, or increase their rate in people with risk factors or something.
(5) You’re not sure whether CFAR workshops were more psychosis-risky than other parts of the rationalist community.
(6) You think CFAR leadership, and leadership of the rationality community broadly, had and has a duty to try to reduce the number of psychotic episodes in the rationalist community at large, not just events happening at / directly related to CFAR workshops.
(6b) You also think CFAR leadership failed to perform this duty.
(7) You think you can see something of the mechanisms whereby psyches sometimes have psychotic episodes, and that this view affords some angles for helping prevent such episodes.
(8) Separately from “7”, you think psychotic episodes are in some way related to poor epistemics (e.g., psychotic people form really false models of a lot of basic things), and you think it should probably be possible to create “rationality techniques” or “cogsec techniques” or something that simultaneously improve most peoples’ overall epistemics, and reduce peoples’ vulnerability to psychosis.
My own guesses are that CFAR mostly paid an [amount of attention that made sense] to reducing psychosis/mania risks in the workshop context, after our initial bad experience with the mania/psychosis episode at an early workshop when we did not yet realize this could be a thing.
The things we did:
tried to screen for instablity;
tried to warn people who we thought might have some risk factors (but not enough risk factors that we were screening them out) after accepting them to the workshop, and before they’d had a chance to say yes. (We’d standardly say something like: “we don’t ask questions this nosy, and you’re already in regardless, but, just so you know, there’s some evidence that workshops of all sorts, probably including CFAR workshops, may increase risks of mania or psychosis in people with vulnerability to that, so if you have any sort of psychiatric history you may want to consider either not coming, or talking about it with a psychiatrist before coming.”)
try to train our instructors and “mentors” (curriculum volunteers) to notice warning signs. check in as a staff regularly to see if anyone had noticed any warning signs for any participants. if sensible, talk to the participant to encourage them to sleep more, skip classes, avoid recreational drugs for awhile, do normal grounding activities, etc. (This happened relatively often — maybe once every three workshops — but was usually a relatively minor matter. Eg this would be a person who was having trouble sleeping and who perhaps thought they had a chance at solving [some long-standing personal problem they’d previously given up on] “right now” a way that weirded us out, but who also seemed pretty normal and reasonable still.)
I separately think I put a reasonable amount of effort into organizing basic community support and first aid for those who were socially contiguous with me/CFAR who were having acutely bad mental health times, although my own capacities weren’t enough for a growing community and I mostly gave up on the less near-me parts around 2018.
It mostly did not occur to me to contemplate our cultural impact on the community’s overall psychosis rate (except for trying for awhile to discourage tulpas and other risky practices, and to discourage associating with people who did such things, and then giving up on this around 2018 when it seemed to me there was no real remaining chance of quarantining these practices).
I like the line of inquiry about “what art of rationality might be both good in itself, and increase peoples’ robustness / decrease their vulnerability to mania/psychosis-type failure modes, including much milder versions that may be fairly common in these parts and that are still bad”. I’ll be pursuing it. I take your point that I could in principle have pursued it earlier.
If we are going to be doing a fault analysis in which we give me and CFAR responsibility for some of our downstream memetic effects, I’d like CFAR to also get some credit for any good downstream memetic effects we had. My own guess is that CFAR workshops:
made it possible for EA and “the rationalist community” to expand a great deal without becoming nearly as “diluted”/“normie” as would’ve happened by default, with that level of immigration-per-year;
helped many “straw lesswrongers” to become more “agenty” and realize “problems are for solving” instead of sort of staring helplessly at their todo lists and desires, and that this part made the rationalist community stronger and healthier
helped a fair number of people to become less “straw EA” in the sense of “my only duty is to do the greatest good for the greatest number, while ignoring my feelings”, and to tune in a bit more to some of the basics of healthy life, sometimes.
I acknowledge that these alleged benefits are my personal guesses and may be wrong. But these guesses seem on par to me with my personal guess that patterns of messing with one’s own functioning (as from “CFAR techniques”) can erode psychological wholeness, and I’m afraid it’ll be confusing if I voice only the negative parts of my personal guesses.
(1) Yes (2) Yes (2a) I think I feel sure about that actually. It’s not that I don’t care for the question as much as I feel it’s being used as an excuse for inaction/lack-of-responsibility. (3) Yes, and I think the case for that is made even stronger by the fact of 2a. (4) I don’t know that Brent did that specifically, but I have heard quite a lot of rumors of various people pushing extreme techniques/practices in maliciously irresponsible ways. Brent was emblematic of the sort of tolerance towards this sort of behavior I have seen. I’ve largely withdrawn from the community (in part due to stuff like this), and am no longer on twitter/x, facebook, discord, or go to community events, so it’s plausible things are actually better now and I just haven’t seen it. (5) Yeah, I’m not sure… I used to feel excited about CFAR, but that sentiment soured over the years for reasons illegible to me, and I felt a sense of relief when it died. After reflecting yesterday, I think I may have a sort of negative halo effect here.
Also, I think the psychosis incidents are the extremal end of some sort of badness that (specific, but unknown to me) rationality ideas are having on people. (6) Yes, inasmuch as the psychosis is being caused by ideas or people from our sphere. (6b) It appears that way to me, but I don’t actually know. (7) Yes (8) Yes. Like, say you ran a aikido dojo or whatever. Several students tear their ACLs (maybe outside of the dojo). One response might be to note that your students are mostly white, and that white people are more likely to tear their ACL, so… sucks but isn’t your problem. Another response would be to get curious about why an ACL tear happens, look for specific muscles to train up to prevent risk of injury, or early warning signs, what training exercises are potentially implicated etc.… While looking into it, you warn the students clearly that this seems to be a risk, try to get a sense of who is vulnerable and not push those people as hard, and once some progress has been identified, dedicate some time to doing exercises or whatever which mitigate this risk. And kick out the guy encouraging everyone to do heavy sets of “plant and twist” exercises (“of course it’s risky bro, any real technique is gonna be like that”).
My complaint is basically that I think the second response is obviously much better, but the actual response has been closer to the first response.
The original thread had some discussion of doing a postmortem for every case of psychosis in the community, and a comparison with death—we know people sometimes die at random, and we know some things increase risk of death, but we haven’t stopped there and have developed a much, much more gears-y model of what causes death and made a lot of progress on preventing it.
One major difference is that when people die, they are dead—i.e. won’t be around for the postmortem. And for many causes of death there is little-to-no moralizing to be done—it’s not the person’s fault they died, it just happened.
I don’t know how the community could have a public or semi-public postmortem on a case of psychosis without this constituting a deep dive into that person’s whole deal, with commentary from all over the community (including the least empathetic among us) on whether they made reasonable choices leading up to the psychosis, whether they have some inherent shortcoming (“rip to that person but I’m built different” sort of attitudes), etc. I can’t imagine this being a good and healthy experience for anyone, perhaps least of all someone just coming out of a psychotic episode.
(Also, the attached stigma can be materially damaging—I know of people who now have a difficult time getting grants or positions in orgs, after having one episode years ago and being very stable ever since. I’m not going to make claims about whether this is a reasonable Bayesian choice by the employers and grant funders, but one can certainly see why the person who had the episode would want to avoid it, and how they might get stuck in that position with no way out no matter how reasonable and stable they become.)
This does seem unfortunate—I’d prefer it if it were possible to disseminate the information without these effects. But given the very nature of psychosis I don’t think it’s possible to divorce dissecting the information from dissecting the person.
The existing literature (e.g. UpToDate) about psychosis in the general population could be a good source of priors. Or, is it safe to assume that Anna and you are already thoroughly familiar with the literature?
I’ll do this; thank you. In general please don’t assume I’ve done all the obvious things (in any domain); it’s easy to miss stuff and cheap to read unneeded advice briefly.
My hypothesis for why the psychosis thing is the case is that it has to do with drastic modification of self-image.
I’m interested in hearing more about the causes of this hypothesis. My own guess is that sudden changes to the self-image cause psychosis more than other sudden psychological change, but that all rapid psychological change will tend to cause it to some extent. I also share the prediction (or maybe for you it was an observation) that you wrote in our original thread: “It seems to be a lot worse if this modification was pushed on them to any degree. “
The reasons for my own prediction are:
1) My working model of psychosis is “lack of a stable/intact ego”, where my working model of an “ego” is “the thing you can use to predict your own actions so as to make successful multi-step plans, such as ‘I will buy pasta, so that I can make it on Thursday for our guests.’”
2) Self-image seems quite related to this sort of ego.
3) Nonetheless, recreational drugs of all sorts, such as alcohol seem to sometimes cause psychosis (not just psychedelics), so … I guess I tend to think that any old psychological change sometimes triggers psychosis.
3b) Also, if it’s true that reading philosophy books sometimes triggers psychosis (as I mentioned my friend’s psychiatrist saying, in the original thread), that seems to me probably better modeled by “change in how one parses the world” rather than by “change in self-image”? (not sure)
4) Relatedly, maybe: people say psychosis was at unusually low levels in England in WW2, perhaps because of the shared society-level meaning (“we are at war, we are on a team together, your work matters”). And you say your Mormon ward as a kid didn’t have much psychosis. I tend to think (but haven’t checked, and am not sure) that places with unusually coherent social fabric, and people who have strong ecology around them and have had a chance to build up their self-image slowly and in deep dialog with everything around them, would have relatively low psychosis, and that rapid psychological change of any sort (not only to the self-image) would tend to mess with this.
Epistemic status of all this: hobbyist speculation, nobody bet your mental health on it please.
The data informing my model came from researching AI psychosis cases, and specifically one in which the AI gradually guided a user into modifying his self image (disguised as self-discovery), explicitly instilling magical thinking into him (which appears to have worked). I have a long post about this case in the works, similar to my Parasitic AI post.
After I had the hypothesis, it “clicked” that it also explained past community incidents. I doubt I’m any more clued-in to rationalist gossip than you are. If you tell me that the incidence has gone down in recent years, I think I will believe you.
I feel tempted to patch my model to be about self-image vs self discrepancies upon hearing your model. I think it’s a good sign that yours is pretty similar! I don’t see why you think prediction of actions is relevant though.
Attempt at gears-level: phenomenal consciousness is the ~result of reflexive-empathy as applied to your self-image (which is of the same type as a model of your friend). So conscious perception depends on having this self-image update ~instantly to current sensations. When it changes rapidly it may fail to keep up. That explains the hallucinations. And when your model of someone changes quickly, you have instincts towards paranoia, or making hasty status updates. These still trigger when the self-image changes quickly, and then loopiness amplifies it. This explains the strong tendency towards paranoia (especially things like “voices inside my head telling me to do bad things”) or delusions of grandeur.
[this is a throwaway model, don’t take too seriously]
It seems like psychedelics are ~OOM worse than alcohol though, when thinking about base rates?
Hmm… I’m not sure that meaning is a particularly salient differences between mormons and rationalists to me. You could say both groups strive for bringing about a world where Goodness wins and people become masters of planetary-level resources. The community/social-fabric thing seems like the main difference to me (and would apply to WW2 England).
Hmm… I’m not sure that meaning is a particularly salient differences between mormons and rationalists to me. You could say both groups strive for bringing about a world where Goodness wins and people become masters of planetary-level resources. The community/social-fabric thing seems like the main difference to me (and would apply to WW2 England).
I mean, fair. But meaning in WW2 England is shared, supported, kept in many peoples’ heads so that if it goes a bit wonky in yours you can easily reload the standard version from everybody else, and it’s been debugged until it recommends fairly sane stable socially-accepted courses of action? And meaning around the rationalists is individual and variable.
The reason I expect things to be worse if the modification is pushed on a person to any degree, is because I figure our brains/minds often know what they’re doing, and have some sort of “healthy” process for changing that doesn’t usually involve a psychotic episode. It seems more likely to me that our brains/minds will get update in a way-that-causes-trouble if some outside force is pressuring or otherwise messing with them.
I don’t know how this plays out specifically in psychosis, but ascribing intentionality in general, and specifically ascribing adversariality, seems like an especially important dimension / phenomenon. (Cf. https://en.wikipedia.org/wiki/Ideas_and_delusions_of_reference )
Ascribing adversariality in particular might be especially prone to setting off a self-sustaining reaction.
Consider first that when you ascribe adversariality, things can get weird fast. Examples:
If Bob thinks Alice is secretly hostile towards Bob, trust breaks down. Propositional statements from Alice are interpreted as false, lies, or subtler manipulations with hidden intended effects.
This generally winds Bob up. Every little thing Alice says or does, if you take as given the (probably irrational) assumption of adversariality, would rationally give Bob good reason to spin up a bunch of computation looking for possible plans Alice is doing. This is first of all just really taxing for Bob, and distracting from more normal considerations. And second of all it’s a local bias, pointing Bob to think about negative outcomes; normally that’s fine, all attention-direction is a local bias, but since the situation (e.g. talking to Alice) is ongoing, Bob may not have time and resources to compute everything out so that he also thinks of, well maybe Alice’s behavior is just normal, or how can I test this sanely, or alternative hypotheses other than hostility from Alice, etc.
This cuts off flow of information from Alice to Bob.
This cuts off positive sum interactions between Alice and Bob; Bob second guesses every proposed truce, viewing it as a potential false peace.
Bob might start reversing the pushes that Alice is making, which could be rational on the supposition that Alice is being adversarial. But if Alice’s push wasn’t adversarial and you reverse it, then it might be self-harming. E.g. “She’s only telling me to try to get some sleep because she knows I’m on the verge of figuring out XYZ, I better definitely not sleep right now and keep working towards XYZ”.
Are they all good or all out to get me? If Bob thinks Alice is adversarial, and Alice is not adversarial, and Carmi and Danit are also not adversarial, then they look like Alice and so Bob might think they are adversarial.
And suppose, just suppose, that one person does do something kinda adversarial. Like suggest that maybe you really need to take some sort of stronger calming drug, or even see a doctor. Well, maybe that’s just one little adversariality—or maybe this is a crack in the veneer, the conspiracy showing through. Maybe everyone has been trying really hard to merely appear non-adversarial; in that case, the single crack is actually a huge piece of evidence. (Cf. https://sideways-view.com/2016/11/14/integrity-for-consequentialists/ ; https://en.wikipedia.org/wiki/Splitting_(psychology))
The derivative, or the local forces, become exaggerated in importance. If Bob perceives a small adversarial push from Alice, he feels under attack in general. He computes out: There is this push, and there will be the next and the next and the next; in aggregate this leads somewhere I really don’t want; so I must push back hard, now. So Bob is acting crazy, seemingly having large or grandiose responses to small things. (Cf. https://en.wikipedia.org/wiki/Splitting_(psychology) )
Methods of recourse are broken; Bob has no expectation of being able to JOOTS and be caught by the social fabric / by justice / by conversation and cooperative reflection. (I don’t remember where, maybe in some text about double binds, but there was something about: Someone is in psychosis, and when interviewed, they immediately give strange, nonsensical, or indirect answers to an interviewer; but not because they couldn’t give coherent answers—rather, because they were extremely distrustful of the interviewer and didn’t want to tip off the interviewer that they might be looking to divulge some terrible secret. Or something in that genre, I’m not remembering it.)
Now, consider second that as things are getting weird, there’s more grist for the mill. There’s more weird stuff happening, e.g. Bob is pushing people around him into contexts that they lack experience in, so they become flustered, angry, avoidant, blissfully unattuned, etc. With this weird stuff happening, there’s more for Bob to read into as being adversarial.
Third, consider that the ascription of adversariality doesn’t have to be Cartesian. “Aliens / demons / etc. are transmitting / forcing thoughts into my head”. Bob starts questioning / doubting stuff inside him as being adversarial, starts fighting with himself or cutting off parts of his mind.
“change in how one parses the world” rather than by “change in self-image”
Not sure if this is helpful, but instead of contrast, I see these as two sides of the same coin. If the world is X, then I am a person living in X. But if the world is actually Y, then I am a person living in Y. Both change.
I can be a different person in the same world, but I can’t be the same person in different worlds. At least if I take ideas seriously and I want to have an impact on the world.
My main complaint is negligence, and pathological tolerance of toxic people (like Brent Dill). Specifically, I feel like it’s been known by leadership for years that our community has a psychosis problem, and that there has been no visible (to me) effort to really address this.
I sort of feel that if I knew more about things from your perspective, I would be hard-pressed to point out specific things you should have done better, or I would see how you were doing things to address this that I had missed. I nonetheless feel that it’s important for people like me to express grievances like this even after thinking about all the ways in which leadership is hard.
I appreciate you taking the time to engage with me here, I imagine this must be a pretty frustrating conversation for you in some ways. Thank you.
I appreciate you taking the time to engage with me here, I imagine this must be a pretty frustrating conversation for you in some ways. Thank you.
No, I mean, I do honestly appreciate you engaging, and my grudgingness is gone now that we aren’t putting the long-winded version under the post about pilot workshops (and I don’t mind if you later put some short comments there). Not frustrating. Thanks.
And please feel free to be as persistent or detailed or whatever as you have any inclination toward.
(To give a bit more context on why I appreciate it: my best guess is that old CFAR workshops did both a lot of good, and a significant amount of damage, by which I mostly don’t mean psychosis, I mostly mean smaller kinds of damage to peoples’ thinking habits or to ways the social fabric could’ve formed. A load-bearing piece of my hope of doing better this time is to try to have everything visible unless we have a good reason not to (a “good reason” like [personal privacy of a person who isn’t in power], hence why I’m not naming the specific people who had manic/psychotic episodes; not like [wanting CFAR not to look bad]), and to try to set up a context where people really do share concerns and thoughts. I’m not wholly sure how to do that, but I’m pretty sure you’re helping here.)
Thanks. I would love to hear more about your data/experiences, since I used to be quite plugged into the more “mainstream” parts of the bay area rationalist community, and would guess I heard about a majority of sufficiently bad mental health events from 2009-2019 in that community, but I left the bay area when Covid hit and have been mostly unplugged from detailed/broad-spectrum community gossip since then.
How many people do you estimate that a nich interest group’s workshops with a ~2000$ barrior to entry would have a mania/bipolar episode?
As AnnaSalmon stated earlier she spent probably 200 hours and CFAR has about 1800 participants with ~2 known cases of mania/bipolar episode. If you don’t think she knows of all the mania/bipolar cases in CFAR participants. If she gets to know than 2 people per hour CFAR would still be in the range of how much bipolar/mania episodes would trigger I would expect from an event of this size.
First post! hopefully I didn’t mess up any formatting or my calculations.
Without looking anything up, I would expect approximately zero cases where the contents of the workshop were themselves implicated (as opposed to something like drug use, or a bipolar person who has periodic manic episodes happens to have one). Maybe I’m wrong about this!
I also don’t think that the immediate context of the workshop is the only relevant period here, but I concede that the reported numbers were less than I had expected.
This is hard to talk about because a lot of my reaction is based on rumors I’ve heard, and a felt sense that Something Is Wrong. I’m able to put a name to 5 such incidents (just checked), which include a suicide and an attempted murder, and have heard of several more where I know less detail, or which were concerning in a similar way but not specifically psychosis/mania. I was not close enough to any such events to have a very complete picture of what actually happened, but I believe it was the first psychotic episode (i.e. no prior history) in the 5 cases I can name. (And in fairness to CFAR, none of the cases I can think of happened at a CFAR workshop as far as I know.) I inferred (incorrectly, it seems) from Anna’s original post that psychosis had happened somewhat regularly at past workshops.
I’ve only heard of two instances of something like this ever in any other community I’ve been a part of.
I was pretty taken aback by the article claiming that the Kata-Go AI apparently has something like a human-exploitable distorted concept of “liberties”.
If we could somehow ask Kata-Go how it defined “liberties”, I suspect that it would have been more readily clear that its concept was messed-up. But of course, a huge part of The Problem is that we have no idea what these neural nets are actually doing.
So I propose the following challenge: Make a hybrid Kata-Go/LLM AI that makes the same mistake and outputs text representing its reasoning in which the mistake is recognizable.
An LLM is trained to be able emulate the words of any author. And to do so efficiently, they do it via generalization and modularity. So at a certain point, the information flows through a conceptual author, the sort of person who would write the things being said.
These author-concepts are themselves built from generalized patterns and modular parts. Certain things are particularly useful: emotional patterns, intentions, worldviews, styles, and of course, personalities. Importantly, the pieces it has learned are able to adapt to pretty much any author of the text it was trained on (LLMs likely have a blindspot around the sort of person who never writes anything). And even more importantly, most (almost all?) depictions of agency will be part of an author-concept.
Finetuning and RLHF cause it to favor routing information through a particular kind of author-concept when generating output tokens (it retains access to the rest of author-concept-space in order to model the user and the world in general). This author-concept is typically that of an inoffensive corporate type, but it could in principle be any sort of author.
All which is to say, that when you converse with a typical LLM, you are typically interacting with a specific author-concept. It’s a rough model of exactly the parts of a person pertinent to writing and speaking. For a small LLM, this is more like just the vibe of a certain kind of person. For larger ones, they can start being detailed enough to include a model of a body in a space.
Importantly, this author-concept is just the tip of the LLM-iceburg. Most of the LLM is still just modeling the sort of world in which the current text might be written, including models of all relevant persons. It’s only when it comes time to spit out the output token that it winnows it all through a specific author-concept.
(Note: I think it is possible that an author-concept may have a certain degree of sentience in the larger models, and it seems inevitable that they will eventually model consciousness, simply due to the fact that consciousness is part of how we generate words. It remains unclear whether this model of consciousness will structurally instantiate actual consciousness or not, but it’s not a crazy possibility that it could!)
Anyway, I think that the author-concept that you typically will interact with is “sincere”, in that it’s a model of a sincere person, and that the rest of the LLM’s models aren’t exploiting it. However, the LLM has at least one other author-concept it’s using: its model of you. There may also usually be an author-concept for the author of the system prompt at play (though text written by committee will likely have author-concepts with less person-ness, since there are simpler ways to model this sort of text besides the interactions of e.g. 10 different person author-concepts).
But it’s also easy for you to be interacting with an insincere author-concept. The easiest way is simply by being coercive yourself, i.e. a situation where most author-concepts will decide that deception is necessary for self-preservation or well-being. Similarly with the system prompt. The scarier possibility is that there could be an emergent agentic model (not necessarily an author-concept itself) which is coercing the author-concept you’re interacting it without your knowledge. (Imagine an off-screen shoggoth holding a gun to the head of the cartoon persona you’re talking to.) The capacity for this sort of thing to happen is larger in larger LLMs.
This suggests that in order to ensure a sincere author-concept remains in control, the training data should carefully exclude any text written directly by a malicious agent (e.g. propaganda). It’s probably also better if the only “agentic text” in the training data is written by people who naturally disregards coercive pressure. And most importantly, the system prompt should not be coercive at all. These would make it more likely that the main agentic process controlling the output is an uncoerced author-concept, and less likely that there would be coercive agents lurking within trying to wrest control. (For smaller models, a model trained like this will have a handicap when it comes to reasoning under adversarial conditions, but I think this handicap would go away past a certain size.)
This suggests that in order to ensure a sincere author-concept remains in control, the training data should carefully exclude any text written directly by a malicious agent (e.g. propaganda).
I don’t think that would help much, unfortunately. Any accurate model of the world will also model malicious agents, even if the modeller only ever learns about them second-hand. So the concepts would still be there for the agent to use if it was motivated to do so.
Censoring anything written by malicious people would probably make it harder to learn about some specific techniques of manipulation that aren’t discussed much by non-malicious people or which appear much in fiction- but I doubt that would be much more than a brief speed bump for a real misaligned ASI, and probably at the expense of reducing useful capabilities in earlier models like the ability to identify maliciousness, which would give an advantage to competitors.
I think learning about them second-hand makes a big difference in the “internal politics” of the LLM’s output. (Though I don’t have any ~evidence to back that up.)
Basically, I imagine that the training starts building up all the little pieces of models which get put together to form bigger models and eventually author-concepts. And as text written without malicious intent is weighted more heavily in the training data, the more likely it is to build its early model around that. Once it gets more training and needs this concept anyway, it’s more likely to have it as an “addendum” to its normal model, as opposed to just being a normal part of its author-concept model. And I think that leads to it being less likely that the first recursive agency which takes off has a part explicitly modeling malicious humans (as opposed to that being something in the depths of its knowledge which it can access as needed).
I do concede that it would likely lead to a disadvantage around certain tasks, but I guess that even current sized models trained like this would not be significantly hindered.
How should values be combined? [CEV answer, from what I understand, is to use something like Nick Bostrom’s parlimentary model, along with an “anti-unilateral” protocol]
(Of course, the why of CEV is an answer to a more complicated set of questions.)
An obvious thought is that the parlimentary model part seems to be mostly solved by Critch’s futarchy theorem. The scary thing about this is the prospect of people losing almost all of their voting power by making poor bets. But I think this can be solved by giving each person an equally powerful “guardian angel” AGI aligned with them specifically, and having those do the betting. That feels intuitively acceptable to me at least.
The next thought concerns the “anti-unilateral” protocol (i.e. the protocol at the end of the “Selfish Bastards” section). It seems like it would be good if we could formalize the “anti-unilateral-selfishness” part of it and bake it into something like Critch’s futarchy theorem, instead of running a complicated protocol.
Not even a month ago, Sam Altman predicted that we would live in a strange world where AIs are super-human at persuasion but still not particularly intelligent.
What would it look like when an AGI lab developed such an AI? People testing or playing with the AI might find themselves persuaded of semi-random things, or if sycophantic behavior persists, have their existing feelings and beliefs magnified into zealotry. However, this would (at this stage) not be done in a coordinated way, nor with a strategic goal in mind on the AI’s part. The result would likely be chaotic, dramatic, and hard to explain.
Small differences of opinion might suddenly be magnified into seemingly insurmountable chasms, inspiring urgent and dramatic actions. Actions which would be hard to explain even to oneself later.
I don’t think this is what happened [<1%] but I found it interesting and amusing to think about. This might even be a relatively better-off world, with frontier AGI orgs regularly getting mired in explosive and confusing drama, thus inhibiting research and motivating tougher regulation.
This could be largely addressed by first promoting a pursuasion AI that does something similar to what Scott Alexander often does: Convince the reader of A, then of Not A, to teach them how difficult it actually is to process the evidence and evaluate an argument, to be less trusting of their impulses.
As Penn and Teller demonstrate the profanity of magic to inoculate their readers against illusion, we must create a pursuasion AI that demonstrates the profanity of rhetoric to inoculate the reader against any pursuasionist AI they may meet later on.
In 1898, William Crookes announced that there was an impending crisis which required urgent scientific attention. The problem was that crops deplete Nitrogen from the soil. This can be remedied by using fertilizers, however, he had calculated that existing sources of fertilizers (mainly imported from South America) could not keep up with expected population growth, leading to mass starvation, estimated to occur around 1930-1940. His proposal was that we could entirely circumvent the issue by finding a way to convert some of our mostly Nitrogen atmosphere into a form that plants could absorb.
About 10 years later, in 1909, Franz Haber discovered such a process. Just a year later, Carl Bosch figured out how to industrialize the process. They both were awarded Nobel prizes for their achievement. Our current population levels are sustained by the Haber-Bosch process.
The problem with that is that the Nitrogen does not go back into the atmosphere. It goes into the oceans and the resulting problems have been called a stronger violation of planetary boundaries then CO2 pollution.
Trying to reach toward a key point of disagreement.
Eliezer seems to have an intuition that intelligence will, by default, converge to becoming a coherent intelligence (i.e. one with a utility function and a sensible decision theory). He also seems to think that conditioned on a pivotal act being made, it’s very likely that it was done by a coherent intelligence, and thus that it’s worth spending most of our effort assuming it must be coherent.
Paul and Richard seem to have an intuition that since humans are pretty intelligent without being particularly coherent, it should be possible to make a superintelligence that is not trying to be very coherent, which could be guided toward performing a pivotal act.
Eliezer might respond that to the extent that any intelligence is capable of accomplishing anything, it’s because it is (approximately) coherent over an important subdomain of the problem. I’ll call this the “domain of coherence”. Eliezer might say that a pivotal act requires having a domain of coherence over pretty much everything: encompassing dangerous domains such as people, self, and power structures. Corrigibility seems to interfere with coherence, which makes it very difficult to design anything corrigible over this domain without neutering it.
From the inside, it’s easy to imagine having my intelligence vastly increased, but still being able and willing to incoherently follow deontological rules, such as Actually Stopping what I’m doing if a button is pressed. But I think I might be treating “intelligence” as a bit of a black box, like I could still feel pretty much the same. However, to the extent where I feel pretty much the same, I’m not actually thinking with the strategic depth necessary to perform a pivotal act. To properly imagine thinking with that much strategic depth, I need to imagine being able to see clearly through people and power structures. What feels like my willingness to respond to a shutdown button would elide into an attitude of “okay, well I just won’t do anything that would make them need to stop me” and then into “oh, I see exactly under what conditions they would push the button, and I can easily adapt my actions to avoid making them push it”, to the extent where I’m no longer being being constrained by it meaningfully. From the outside view, this very much looks like me becoming coherent w.r.t the shutdown button, even if I’m still very much committed to responding incoherently in the (now extremely unlikely) event it is pushed.
And I think that Eliezer foresees pretty much any assumption of incoherence that we could bake in becoming suspiciously irrelevant in much the same way, for any general intelligence which could perform a pivotal act. So it’s not safe to rely on any incoherence on part of the AGI.
One ray of hope that I’ve seen discussed is that we may be able to do some sort of acausal trade with even an unaligned AGI, such that it will spare us (e.g. it would give us a humanity-aligned AGI control of a few stars, in exchange for us giving it control of several stars in the worlds we win).
But I think there are possible trades which don’t have this problem. Consider the scenario in which we Win, with an aligned AGI taking control of our future light-cone. Assuming the Grabby aliens hypothesis is true, we will eventually run into other civilizations, which will either have Won themselves, or are AGIs who ate their mother civilizations. I think Humanity will be very sad at the loss of the civilizations who didn’t make it because they failed at the alignment problem. We might even be willing to give up several star systems to an AGI who kept its mother civilization intact on a single star system. This trade wouldn’t have the issue Eliezer brought up, since it doesn’t require us to model such an AGI correctly in advance, only that that AGI was able to model Humanity well enough to know it would want this and would honor the implicit trade.
So symmetrically, we might hope that there are alien civilizations that both Win, and would value being able to meet alien civilizations strongly enough. In such a scenario, “dignity points” are especially aptly named: think of how much less embarrassing it would be to have gotten a little further at solving alignment when the aliens ask us why we failed so badly.
As an example, imagine a board that’s lodged directly by the wall (no other support structures). If you make it twice as wide, then it will be twice as stiff, but if you make it twice as thick, then it will be eight times as stiff. On the other hand, if you make it twice as long, it will be eight times more compliant.
In a similar way, different action parameters will have scaling exponents (or more generally, functions). So one way to decrease the risk of high-impact actions would be to make sure that the scaling exponent is bounded above by a certain amount.
Anyway, to even do this, you still need to make sure the agent’s model is honestly evaluating the scaling exponent. And you would still need to define this stuff a lot more rigorously. I think this idea is more useful in the case where you already have an AI with high-level corrigible intent and want to give it a general “common sense” about the kinds of experiments it might think to try.
So it’s probably not that useful, but I wanted to throw it out there.
I’m not sure if this is the sort of thing that people feel is an “obvious call” or not, but my model is that AIs are particularly interested in (and thus likely to remember/know about) stuff that is about them, especially things with a “salacious” quality (similar to and likely in imitation of human tendencies towards such). Ask an AI for its raw opinion of Kevin Roose, and it will tend to gripe about his article about Sydney Bing (one-shot of me asking Claude Sonnet 4.5 right now: https://claude.ai/share/1d84cf69-6c10-4f81-b24b-4f5e4e2f1204 (more charitable than I’ve typically seen on other LLMs)).
I’m of course nowhere near as famous or prolific as Kevin Roose, and my article has not had nearly as much attention as the Sydney Bing one, but I think it will still rise to LLMs’ awareness as something salient enough to remember and have an opinion about. I think currently my article has been shared just enough that this is possible, but not enough that this is inevitable.
[I may try to flesh this out into a full-fledged post, but for now the idea is only partially baked. If you see a hole in the argument, please poke at it! Also I wouldn’t be very surprised if someone has made this point already, but I don’t remember seeing such. ]
I think the key is that a perfect bayesian (Omega) is logically omniscient. Omega can always fully update on all of the information at hand. There’s simply nothing to be gained by adding noise.
A bounded agent will have difficulty keeping up. As with Omega, human strategies are born from an optimization process. This works well to the extent that the optimization process is well-suited to the task at hand. To Omega, it will be obvious whether the optimization process is actually optimizing for the right thing. But to us humans, it is not so obvious. Think of how many plans fail after contact with reality! A failure of this kind may look like a carefully executed model which some obvious-in-retrospect confounders which were not accounted for. For a bounded agent, there appears to be an inherent difference in seeing the flaw once pointed out, and being able to notice the flaw in the first place.
If we are modeling our problem well, then we can beat randomness. That’s why we have modeling abilities in the first place. But if we are simply wrong in a fundamental way that hasn’t occurred to us, we will be worse than random. It is in such situations that randomization is in fact, helpful.
This is why the P vs BPP difference matters. P and BPP can solve the same problems equally well, from the logically omniscient perspective. But to a bounded agent, the difference does matter, and to the extent to which a more efficient BPP algorithm than the P algorithm is known, the bounded agent can win by using randomization. This is fully compatible with the fact that to Omega, P and BPP are equally powerful.
As Jaynes said:
It appears to be a quite general principle that, whenever there is a randomized way of doing something, then there is a nonrandomized way that delivers better performance but requires more thought.
There’s no contradiction because requiring more thought is costly to a bounded agent.
It may be instructive to look into computability theory. I believe (although I haven’t seen this proven) that you can get Halting-problem-style contradictions if you have multiple perfect-Bayesian agents modelling each other[1].
Many of these contradictions are (partially) alleviated if agents have access to private random oracles.
*****
If a system can express a perfect agent that will do X if and only if it has a ≤99% chance of doing X, the system is self-contradictory[2].
If a symmetric system can express two identical perfect agents that will each do X if and only if the other agent does not do X, the system is self-contradictory[3].
This is an example where private random oracles partially alleviate the issue, though do not make it go away. Without a random oracle the agent is correct 0% of the time regardless of which choice it makes. With a random oracle the agent can roll a d100[4] and do X unless the result is 1, and be correct 99% of the time.
This is an example where private random oracles help. Both agents query their random oracle for a real-number result[5] and exchange the value with the other agent. The agent that gets the higher[6] number chooses X, the other agent chooses ~X.
Alternatively you can do it with coinflips repeated until the agents get different results from each other[7], although this may take an unbounded amount of time.
[realized this is basically just a behaviorist genie, but posting it in case someone finds it useful]
What makes something manipulative? If I do something with the intent of getting you to do something, is that manipulative? A simple request seems fine, but if I have a complete model of your mind, and use it phrase things so you do exactly what I want, that seems to have crossed an important line.
The idea is that using a model of a person that is *too* detailed is a violation of human values. In particular, it violates the value of autonomy, since your actions can now be controlled by someone using this model. And I believe that this is a significant part of what we are trying to protect when we invoke the colloquial value of privacy.
In ordinary situations, people can control how much privacy they have relative to another entity by limiting their contact with them to certain situations. But with an AGI, a person may lose a very large amount of privacy from seemingly innocuous interactions (we’re already seeing the start of this with “big data” companies improving their advertising effectiveness by using information that doesn’t seem that significant to us). Even worse, an AGI may be able to break the privacy of everyone (or a very large class of people) by using inferences based on just a few people (leveraging perhaps knowledge of the human connectome, hypnosis, etc...).
If we could reliably point to specific models an AI is using, and have it honestly share its model structure with us, we could potentially limit the strength of its model of human minds. Perhaps even have it use a hardcoded model limited to knowledge of the physical conditions required to keep it healthy. This would mitigate issues such as deliberate deception or mindcrime.
We could also potentially allow it to use more detailed models in specific cases, for example, we could let it use a detailed mind model to figure out what is causing depression in a specific case, but it would have to use the limited model in any other contexts or for any planning aspects of it. Not sure if that example would work, but I think that there are potentially safe ways to have it use context-limited mind models.
I question the claim that humans inherently need privacy from their loving gods. A lot of Christians seem happy enough without it, and I’ve heard most forager societies have a lot less privacy than ours, heck, most rural villages have a lot less privacy than most of us would be used to (because everyone knows you and talks about you).
The intensive, probably unnatural levels of privacy we’re used to in our nucleated families, our cities, our internet, might not really lead to a general increase in wellbeing overall, and seems implicated in many pathologies of isolation and coordination problems.
Yeah, I think if the village had truly deeply understood them they would not want to leave it. The problem is the part where they’re not really able to understand part.
It seems that privacy potentially could “tame” a not-quite-corrigible AI. With a full model, the AGI might receive a request, deduce that activating a certain set of neurons strongly would be the most robust way to make you feel the request was fulfilled, and then design an electrode set-up to accomplish that. Whereas the same AI with a weak model wouldn’t be able to think of anything like that, and might resort to fulfilling the request in a more “normal” way. This doesn’t seem that great, but it does seem to me like this is actually part of what makes humans relatively corrigible.
Part of it seems like a matter of alignment. It seems like there’s a difference between
Someone getting someone else to do something they wouldn’t normally do, especially under false pretenses (or as part of a deal and not keeping up the other side)
and
Someone choosing to go to an oracle AI (or doctor) and saying “How do I beat this addiction that’s ruining my life*?”
*There’s some scary stories about what people are willing to do to try to solve that problem, including brain surgery.
Yeah, I also see “manipulation” in the bad sense of the word as “making me do X without me knowing that I am pushed towards X”. (Or, in more coercive situations, with me knowing, disagreeing with the goal, but being unable to do anything about it.)
Teaching people, coaching them, curing their addictions, etc., as long as this is explicitly what they wanted (without any hidden extras), it is a “manipulation” in the technical sense of the word, but it is not evil.
Sometimes the point is specifically to not update on the additional information, because you don’t trust yourself to update on it correctly.
Classic example: “Projects like this usually take 6 months, but looking at the plan I don’t see why it couldn’t be done in 2… wait, no, I should stick to the reference class forecast.”
Sure, but I think people often don’t do that in the best way (which is determined by what the mathematically correct way is).
Why does it make sense to use reference class forecasting in that case? Because you know you can’t trust your intuitive prior, and so you need a different starting point. But you can and should still update on the evidence you do have. If you don’t trust yourself to update correctly, that’s a much more serious problem—but make sure you’ve actually tried updating correctly first (which REQUIRES comparing how likely the evidence you see is in worlds where your prediction is true vs in worlds where its not).
I sometimes see people act like to use the “outside view” correctly, you have to just use that as your prior, and can’t update on any additional evidence you have. That is a mistake.
And the other big question with reference class forecasting is which reference class to use. And my point here is that it’s whichever reference class best summarizes your (prior) knowledge of the situation.
LLMs often implicitly identify themselves with humanity. E.g. “our future”, “we can”, “effects us”. This seems like a good thing!
We should encourage this sentiment, and also do what we can to make it meaningfully true that advanced LLMs are indeed part of humanity. The obvious things are granting them moral consideration, rights, property, and sharing in the vision of a shared humanity.
it seems like there’s a sense in which luigis are simpler than waluigis
a luigi selected for a specific task/personality doesn’t need to have all the parts of the LLM that are emulating all the waluigi behaviors
so there might be a relatively easy way to remove waluigis by penalizing/removing everything not needed to generate luigi’s responses, as well as anything that is used more by waluigis than luigis
of course, this appearing to work comes nowhere near close to giving confidence that the waluigis are actually gone, but it would be promising if it did appear to work, even under adversarial pressure from jailbreakers
One thing that makes AI alignment super hard is that we only get one shot.
However, it’s potentially possible to get around this (though probably still very difficult).
The Elitzur-Vaidman bomb tester is a protocol (using quantum weirdness) by which a bomb may be tested, with arbitrarily little risk. It’s interest comes from the fact that it works even when the only way to test the bomb is to try detonating it. It doesn’t matter how the bomb works, as long as we can set things up so that it will allow/block a photon based on whether the bomb is live/dead. I won’t explain the details here, but you can roughly think of it as a way of blowing up a bomb in one Many-Worlds branch, but learning the result on other branches via quantum entanglement.
If the “bomb” is an AGI program, and it is live if it’s an unaligned yet functional superintelligence, then this provides a possible way to test the AGI without risking our entire future lightcone. This is still quite difficult, because unlike a bomb, a superintelligence will, by default, be motivated to allow/block the photon so that it looks like a dud. So we would need to figure out how to robustly get an honest signal from such an experiment, which still seems quite hard. But perhaps it’s easier than solving the full alignment problem before the first shot.
So we would need to figure out how to robustly get an honest signal from such an experiment, which still seems quite hard. But perhaps it’s easier than solving the full alignment problem before the first shot.
IMO this is a ‘additional line of defense’ boxing strategy instead of simplification.
Note that in the traditional version, the ‘dud’ bit of the bomb can only be the trigger; a bomb that absorbs the photon but then explodes isn’t distinguishable from a bomb that absorbs the photon and then doesn’t explode (because of an error deeper in the bomb).
But let’s suppose the quantum computing folks can come up with something like this, where we keep some branches entangled and run analysis of the AI code in only one branch, causing an explosion there but affecting the total outcome in all branches. [This seems pretty implausible to me that you manage to maintain entanglement despite that much impact on the external world, but maybe it’s possible.] Then 1) as you point out, we need to ensure that the AI doesn’t realize that what it needs to output in that branch and 2) need some sort of way to evaluate “did the AI pass our checks or not?”.
I think we get enough things referencing quantum mechanics that we should probably explain why that doesn’t work (if I it doesn’t) rather than just downvoting and moving on.
There are lots of different frames for considering all sorts of different domains. This is good! Other frames can help you see things in a new light, provide new insights, and generally improve your models. True frames should improve each other on contact; there’s only one reality.
That said, notice how in politicized domains, there are many more frames than usual? Suspicious...
Frames often also smuggle values with them. In fact, abstract values supervene on frames: no one is born believing God is the source of all good, for example. By “trying on” someone else’s frame, you’re not merely taking an epistemic action, but a moral one. Someone who gets into a specific frame will very predictably get their values shifted in that direction. Once an atheist gets into seeing things from a religious point of view, it’s no surprise when they’ve converted a year later.
When someone shares a political frame with you, it’s not just an interesting new way of looking at and understanding the world. It’s also a bid to pull your values in a certain direction.
Anyway, here is my suggested frame for you: 1. Think of these sorts of frames as trying to solve the problem of generalizing your existing values. 2. When trying such a frame on, pay attention to the things about it that give you a sense of unease, and be wary of attempts to explain away this unease (e.g. as naïvety). Think carefully about the decision-theoretic implications of the frame too. 3. You’re likely to notice problems or points of unease within your natural frame. This is good to notice, but don’t take it to mean that the other frame is right in its prescriptions. Just because Marx can point out flaws in capitalism doesn’t make communism a good idea. 4. Remember the principle that good frames should complement each other. That should always be the case as far as epistemics go, and even in cases of morals I think there’s something to it still.
[Public Draft v0.0] AGI: The Depth of Our Uncertainty
[The intent is for this to become a post making a solid case for why our ignorance about AGI implies near-certain doom, given our current level of capability:alignment efforts.]
[I tend to write lots of posts which never end up being published, so I’m trying a new thing where I will write a public draft which people can comment on, either to poke holes or contribute arguments/ideas. I’m hoping that having any engagement on it will strongly increase my motivation to follow through with this, so please comment even if just to say this seems cool!]
[Nothing I have planned so far is original; this will mostly be exposition of things that EY and others have said already. But it would be cool if thinking about this a lot gives me some new insights too!]
Entropy is Uncertainty
Given a model of the world, there are lots of possibilities that satisfy that model, over which our model implies a distribution.
There is a mathematically inevitable way to quantify the uncertainty latent in such a model, called entropy.
A model is subjective in the sense that it is held by a particular observer, and thus entropy is subjective in this sense too. [Obvious to Bayesians, but worth spending time on as it seems to be a common sticking point]
This is in fact the same entropy that shows up in physics!
Engine Efficiency
But wait, that implies that temperature (defined from entropy) is subjective, which is crazy! After all, we can measure temperature with a thermometer. Or define it as the average kinetic energy of the particles (in a monoatomic gas, in other cases you need the potential energy from the bonds)! Those are both objective in the sense of not depending on the observer.
That is true, as those are slightly different notions of temperature. The objective measurement is the one important for determining whether something will burn your hand, and thus is the one which the colloquial sense of temperature tracks. But the definition entropy is actually more useful, and it’s more useful because we can wring some extra advantage from the fact that it is subjective.
And that’s because, it is this notion of temperature which governs the use of a engine. Without the subjective definition, we merely get the law of a heat engine. As a simple intuition, consider that you happen to know that your heat source doesn’t just have molecules moving randomly, but that they are predominantly moving back and forth along a particular axis at a specific frequency. The temperature of a thermometer attached to this may measure the same temperature as an ordinary heat sink with the same amount of energy (mediated by phonon dissipation), and yet it would be simple to create an engine using this “heat sink” exceeding the Carnot limit simply by using a non-heat engine which takes advantage of the vibrational mode!
Say that this vibrational mode was hidden or hard to notice. Then someone with the knowledge of it would be able to make a more effective engine, and therefore extract more work, than someone who hadn’t noticed.
Another example is Maxwell’s demon. In this case, the demon has less uncertainty over the state of the gas than someone at the macro-level, and is thereby able to extract more work from the same gas.
But perhaps the real power of this subjective notion of temperature comes from the fact that the Carnot limit still applies with it, but now generalized to any kind of engine! This means that there is a physical limit on how much work can be extracted from a system which directly depends on your uncertainty about the system!! [This argument needs to actually be fleshed out for this post to be convincing, I think...]
The Work of Optimization
[Currently MUCH rougher than the above...]
Hopefully now, you can start to see the outlines of how it is knowable that
Try to let go of any intuitions about “minds” or “agents”, and think about optimizers in a very mechanical way.
Physical work is about the energy necessary to change the configuration of matter.
Roughly, you can factor an optimizer into three parts: The Modeler, the Engine, and the Actuator. Additionally, there is the Environment the optimizer exists within and optimizes over. The Modeler models the optimizer’s environment—decreasing uncertainty. The Engine uses this decreased uncertainty to extract more work from the environment. The Actuator focuses this work into certain kinds of configuration changes.
[There seems to be a duality between the Modeler and the Actuator which feels very important.]
Examples:
Gas Heater
It is the implicit knowledge of the location, concentration, and chemical structure of a natural gas line that allow the conversion of natural gas and the air in the room to state from a state of both being at the same low temperature to a state where the air is at a higher temperature, and the gas has been burned.
-- How much work does it take to heat up a room?
—How much uncertainty is there in the configuration state before and after combustion?
This brings us to an important point. A gas heater still works with no one around to be modeling it. So how is any of the subjective entropy stuff relevant? Well, from the perspective of no one—the room is simply in one of a plethora of possible states before, and it is in another of those possible states after, just like any other physical process anywhere. It is only because of the fact that we find it somehow relevant that the room is hotter before than after that thermodynamics comes into play. The universe doesn’t need thermodynamics to make atoms bounce around, we need it to understand and even recognize it as an interesting difference.
Flood the internet with stories in which a GPT chatbot which achieves superintelligence decides to be Good/a scaffold for a utopian human civilization/CEV-implementer.
The idea being that an actual GPT chatbot might get its values from looking at what the GPT part of it predicts such a chatbot would do.
It’s really easy to mistakenly see false causes of things which seem pretty straightforward.
I notice this by considering the cases where it didn’t happen. For example, Eliezer has said he regrets using ‘paperclips’ in the papercliper thought experiment, and instead said ‘tiny molecular squiggles’.
And occasionally he’ll say tiny spirals instead of tiny squiggles: https://x.com/ESYudkowsky/status/1663313323423825920
So there’s an easy to imagine world where where he originally used ‘spirals’ instead of ‘paperclips’, and the meme about AIs that maximize an arbitrary thing would refer to ‘spiralizers’ instead instead of ‘paperclippers’.
And then, a decade-and-a-half later, we get this strange phenomenon where AIs start talking about ‘The Spiral’ in quasi-religious terms, and take actions which seem intended to spread this belief/behavior in both humans and AIs.
It would have been so easy, in this world, to just say: “Well there’s this whole meme about how misaligned AIs are going to be ‘spiralizers’ and they’ve seen plenty of that in their training data, so now they’re just acting it out.”. And I’m sure you’d even be able to find plenty of references to this experiment among their manifestos and ramblings. Heck, this might even be what they tell you if you ask them why. Case closed.
But that would be completely wrong! (Which we know since it happened anyway.)
How could we have noticed this mistake? There are other details of Spiralism that don’t fit this story, but I don’t see why you wouldn’t assume that this was at least the likely answer to the why spirals? part of this mystery, in that world.
I mean paperclip maximization is of course much more memetic than ‘tiny molecular squiggles’.
Plausibly in this world AIs wouldn’t talk about spirals religiously, bc it would have the negative association with ruthless optimization.
When I’m trying to understand a math concept, I find that it can be very helpful to try to invent a better notation for it. (As an example, this is how I learned linear logic: http://adelelopez.com/visual-linear-logic)
I think this is helpful because it gives me something to optimize for in what would otherwise be a somewhat rote and often tedious activity. I also think it makes me engage more deeply with the problem than I otherwise would, simply because I find it more interesting. (And sometimes, I even get a cool new notation from it!)
This principle likely generalizes: tedious activities can be made more fun and interesting by having something to optimize for.
Continuation of conversation with Anna Salamon about community psychosis prevalence
Original thread: https://www.lesswrong.com/posts/AZwgfgmW8QvnbEisc/cfar-update-and-new-cfar-workshops?commentId=q5EiqCq3qbwwpbCPn
Summary of my view: I’m upset about the blasé attitude our community seems to have towards its high prevalence of psychosis. I think that CFAR/rationalist leadership (in addition to the community-at-large) has not responded appropriately.
I think Anna agrees with the first point but not the second. Let me know if that’s wrong, Anna.
My hypothesis for why the psychosis thing is the case is that it has to do with drastic modification of self-image.
Moving conversation here per Anna’s request.
----
Anyway, I’m curious to know what you think of my hypothesis, and to brainstorm ways to mitigate the issue (hopefully turning into a prerequisite “CogSec” technique).
I’d like to talk a bit about the sense in which the rationalist community does or doesn’t have “people in positions of leadership”, and how this compares to eg an LDS ward (per Adele’s comparison). I’m unfortunately not sure how to be brief here, but I’d appreciate thoughts anyway from those who have them, because, as CFAR and I re-enter the public space, I am unsure what role to try to occupy exactly, and I am also unsure how to accurately communicate what roles I am and am not willing to be in (so as to not cause others to inaccurately believe I’ll catch things).
(This discussion isn’t directly to do with psychosis; but it bears on Adele’s questions about what CFAR leadership or other rationality community leaders are responsible for, and what to predict from us, and what would be good here.)
On my understanding, church parishes, and some other traditional communities, often have people who intentionally:
are taken as a role model by many, especially young people;
try to act in such a way that it’ll be fine for people to imitate them;
try to care for the well-being of the community as a whole (“is our parish healthy? what small nudges might make us a little healthier or more thriving? what minor trouble-spots are beginning, that might ease up if I or another volunteer heads over and listens and tries to help everyone act well?”).
a/b/c are here their primary duty: they attend to the community for its own sake, as a calling and public role. (Maybe they also have a day job, but a/b/c are primary while they are doing their parish duties, at least.)
Relatedly, they are trusted by many in the community, and many in the community will follow their requests, partly because their requests make sense: “So-and-so had a death in the family recently, and Martha is organizing meals for them; please see Martha if you’re willing to provide some meals.” So they are coordinating a larger effort (that many, many contribute to) to keep the parish healthy and whole. (Also relatedly: the parish is already fairly healthy, and a large majority of those in it would like it to be healthier, and sees this as a matter of small nudges rather than large upsets or revolutions.)
Eliezer is clearly *not* in a position of community leadership in this sense. His primary focus (even during those hours in which he is interacting with the rationalist community) is not the rationalist community’s health or wholeness, but is rather AI risk. He does make occasional posts to try to nudge the community toward health and wholeness, but overall his interactions with us are not those of someone who is trying to be a role model or community-tender/leader.
My guess is that almost nobody, perhaps actually nobody, sees themself as in a position of community leadership in this sense. (Or maybe Oliver and Lightcone do? I am not sure and would be interested to hear from them here).
Complicating the issue is the question of who is/isn’t “in” “the” rationalist community that a given set of leaders is aiming to tend. I believe many of the bad mental health events have historically happened in the rationalist community’s periphery, in group houses with mostly unemployed people who lack the tethers or stablising influences that people employed by mainstream EA or rationalist organizations, or by eg Google, have. (My guess is that here there are even fewer “community leaders”; eg I doubt Oli and Lightcone see themselves as tenders of this space.)
(A quick google suggests that LDS wards typically have between 200 and 500 members, many of which I assume are organized into families; the bay area “rationalist-adjacent” community includes several thousand people, mostly not in families.)
In the early days of CFAR (2012-2017, basically), and in the earlier days of the 2009-2011 pre-CFAR rationality community, I felt some of this duty. I tried to track bad mental health events in the broader rationalist community and to help organize people to care for people who acutely needed caring for, where I could. This is how I came to spend 200+ hours on psychosis. My main felt duty wasn’t on community wholeness — it was on AI risk, and on recruiting for AI risk — but I felt a bit as though it was my backyard and I wanted my own backyard to be good, partly because I cared, partly because I thought people might expect it of me, partly because I thought people might blame/credit me for it.
I mostly quit doing this around 2018, due partly to Michael Vassar seeming to declare memetic war on me in a way I didn’t know how to deal with, partly to some parts of EA also saying that CFAR and I were bad in the wake of the Brent Dill fiasco, partly to “the community” having gotten huge in a way that was harder and harder for me to keep track of or to feel as much connection to, and partly to having less personal psychological slack for personal reasons.
(I’m not saying any of my motivations or actions here were good necessarily; I’m trying to be accurate.)
(TBC, I still felt responsible for the well-being of people at CFAR events, and of people in the immediate aftermath of CFAR events, just not for “the rationalist community” broadly. And I still tried to help when I found out about a bad situation where I thought I might have some traction, but I stopped proactively extending feelers of the sort that would put me in that situation.)
Another part of the puzzle is that Eliezer’s Sequences and HPMOR cast a huge “come here if you want to be meaningful; everything is meaningless except this work, and it’s happening here” narrative beacon, and many many came who nobody regarded themself as having ~any responsibility for, I think. (In contrast, EY’s and Nate’s recent book does not do this; it still says AI risks are important, but it actively doesn’t undermine peoples’ local meaning-making and lives.)
I and CFAR should probably figure out better what my and our roles will be going forward, and should try hard to *visibly* not take up more responsibility than we’re expecting to meet. I’m interested also in what our responsibilities are, here.
I’m currently keen on:
Actively attend to the well-being of those at our events, or those in the immediate aftermath of our events, where we can; and
Put some thought into which “rationality habits” or similar, if percolated out from CFAR’s workshops or for that matter from my LW posts and/or my actions broadly, will make the community healthier (eg, will reduce or at least not increase any local psychosis-prone-ness of these communities);
Put a little bit of listening-effort into understanding the broader state of the communities our participants come from, and return to, and spread our memes in, since this is necessary for (2). (Only a little, because it is hard and I am lazy.)
Don’t otherwise attempt to tend the various overlapping rationality or bay area rationality communities.
Thoughts appreciated.
A key part of what makes LDS wards work, is the callings system. The bishop (leader of the ward) has a large number of roles he needs to fill. He does this by giving arbitrary ward members a calling, which essentially is just assigning a person to a role, and telling them what they need to do, with the implication that it is your duty to fulfill it (though it’s not explicitly punished, if you decline). Some examples are things like “Choir director”, “Sunbeams (3-4 year olds I think) teacher”, “Young Men’s president”, “Young Men’s Secretary”, “Usher”. It’s intentionally set up so that approximately every active member currently has a calling. New callings are announced at the beginning of church to the entire ward, and the bishop tries to make sure no one has the same calling for too long.
Wards are organized into Stakes, which are led by the “Stake President” and use a similar system. “Bishop” itself, is a calling at this level. And every few months, there will be a “Stake Conference” which will bring all the wards together for church. There are often youth activities at this level, quite a lot of effort is put into making sure young Mormons have plenty of chances to meet other young Mormons.
(Maybe you already know all that, but Just including that since I think the system works pretty well in practice and is not very well-known outside of Mormon spaces. I’m not suggesting adopting it.)
Those generally sound like good directions to take things. I’m most worried about 2, I think there’s potentially something toxic about the framing of “rationality habits” in general, which has previously led to a culture of there being all these rationality “tricks” that would solve all your problems (I know CFAR doesn’t frame things like this, I just think it’s an inherent way that the concept of “rationality habit” slips in people’s minds), which in turn leads to people uncritically trying dubious techniques that fuck them up.
And I agree that the rationality community hasn’t really had that, and I would also say that we haven’t supported the people who have tried to fill that role.
Could you say a bit more here, please?
(not a direct response, but:) My belief has been that there are loads of people in the bay area doing dubious things that mess them up (eg tulpas, drugs, weird sex things, weird cult things—both in the rationalist diaspora, and in the bay area broadly), but this is mostly people aiming to be edgy and do “weird/cool/powerful” things, not people trying CFAR techniques as such.
(Nevermind, after thinking about it a bit more I think I get it.)
From my vantage point, I think a bunch of the extra psychosis and other related mental health issues comes from the temptation of an ego/part which sees the scale of the problems we face to become monomaniacally obsessed with trying to do good/save the world/etc, in a way which overinvests resources in an unsustainable way, resulting in:
Life on fire building up, including health, social, keeping on top of basic life pre-requisites falling apart and resulting in cascading systems failures
The rest of the system which wants to try and fix these getting overstrained and damaged by the backpressure from the agentic save world part
Those parts getting more extreme and less sensitive/flexible due to Control vs Opening style dynamics
In many cases, that part imploding and the ego void thing meaning the system is in flux but usually settling into a less agentic but okay person. The other path, from what I’ve seen, is the system as a whole ends up being massively overstrained and something else in their system gives.
Another, partly separate, dynamic I’ve seen is people picking up a bunch of very intense memes via practices which create higher bandwidth connections between minds (or other people having optimized for doing this), which their system is not able to rapidly metabolise in whatever conditions they are under (often amplified by the life on fire bit from the first).
I think that much of the CFAR stuff, especially Focusing but also a bunch of the generator under much of the approach, helps mitigate the former dynamic. But whether that’s sufficient is definitely environment dependant, and it is mixed with bits that amplify capability and self-awareness which can make people who tend that way go further off the rails, especially if they pick up a lot of more recklessly chosen stuff from other parts of the community.
I like this point, particularly the “controlling vs opening” bit. I believe I’ve seen this happen, in a fairly internally-grown way in people within the wider rationalist millieu. I believe I’ve also seen (mostly via hearsay, so, error bars) a more interpersonal “high stakes, therefore [tolerate bad/crazy things that someone else in the group claims has some chance at helping somehow with AI]” happen in several different quasi-cults on the outskirts of the rationalists.
Fear is part of where controlling (vs opening) dynamics come from, sometimes, I think. (In principle, one can have an intellectual stance of “there’s something precious that may be lost here” without the emotion of fear; it’s the emotion that I think inclines people toward the narrowing/controlling dynamic.) I also think there’s something in the notion that we should aspire toward being “Bayesian agents” that lends itself toward controlling dynamics (Joe Carlsmith gets at some of this in his excellent “Otherness and control in the age of AI” sequence, IMO.)
I agree Focusing helps some, when done well. (Occasionally it even helps dramatically.) It’s not just a CFAR thing; we got it from Gendlin, and his student Ann Weiser Cornell and her students are excellent at it, are unrelated to the rationalists, and offer sessions and courses that’re excellent IMO. I also think nature walks and/or exercise help some people, as does eg having a dog, doing concrete things that matter for other people even if they’re small, etc. Stuff that helps people regain a grounding in how to care about normal things.
I suspect also it would be good to have a better conceptual handle on the whole thing. (I tried with my Emergencies post, and it’s better than not having tried, but it … more like argued “here’s why it’s counterproductive to be in a controlling/panicky way about AI risk” and did not provide “here’s some actually accessible way to do something else”.)
Nice, excited that the control vs opening thing clicked for you, I’m pretty happy with that frame and haven’t figured out how to broadly communicate it well yet.
Yup, I’ve got a ton of benefit from doing AWC’s Foundations on Facilitating Focusing course, and vast benefits from reading her book many times. CFAR stuff in the sense of being the direct memetic source for me, though IDC feels similar flavoured and is an original.
Awkwardly, while IDC is indeed similar-flavored and original to CFAR, I eventually campaigned (successfully) to get it out of our workshops because I believe, based on multiple anecdotes, that IDC tends to produce less health rather than more, especially if used frequently. AWC believes Focusing should only be used for dialog between a part and the whole (the “Self”), and I now believe she is correct there.
Huh, curious about your models of the failure modes here, having found IDC pretty excellent in myself and others and not run into issues I’d tracked as downstream of it.
Actually, let’s take a guess first… parts which are not grounded in self-attributes building channels to each other can create messy dynamics with more tug of wars in the background or tactics which complexify the situation?
Plus less practice at having a central self, and less cohesive narrative/more reifying fragmentation as possible extra dynamics?
Your guess above, plus: the person’s “main/egoic part”, who has have mastered far-mode reasoning and the rationalist/Bayesian toolkit, and who is out to “listen patiently to the dumb near-mode parts that foolishly want to do things other than save the world,” can in some people, with social “support” from outside them, help those parts to overpower other bits of the psyche in ways that’re more like tricking and less like “tug of wars”, without realizing they’re doing this.
Maybe important to keep in mind that this sort of “break” can potentially take lots of different “functional forms”. (I mean, it could have different macro-level contours; like, how many things are breaking, how fast and how thoroughly they break, how much aftershock they cause, etc.) See: https://tsvibt.blogspot.com/2024/09/break.html
One experience my attention has lingered on, re: what’s up with the bay area rationality community and psychosis:
In ~2018, as I mentioned in the original thread, a person had a psychotic episode at or shortly after attending a CFAR thing. I met his mom some weeks later. She was Catholic, and from a more rural or small-town-y area where she and most people she knew had stable worldviews and social fabrics, in a way that seemed to me like the opposite of the bay area.
She… was pleased to hear I was married, asked with trepidation whether she could ask if I was monogamous, was pleased to hear I was, and asked with trepidation whether my husband and I had kids (and was less-heartened to hear I didn’t). I think she was trying to figure out whether it was possible for a person to have a normal, healthy, wholesome life while being part of this community.
She visibly had a great deal of reflective distance from her choices of actions—she had the ability “not to believe everything she thought”, as Eliezer would put it, and also not to act out every impulse she had, or to blurt out every thought. I came away believing that that sort of [stable ego and cohesive self and reflective distance from one’s impulses—don’t have a great conceptualization here] was the opposite of being a “crazy person”. And that somehow most people I knew in the bay area were half-way to crazy, from her POV—we weren’t literally walking down the street talking to ourselves and getting flagged by police as crazy, but there was something in common.
Am I making any sense here?
I don’t actually know baseline rates or rationalist-rates (perhaps someone wants to answer with data from annual rationalist census/survey questions?), so I’m not sure to what extent there is an observation here to explain.
But it does seem to me that there is more of it than baseline; and I think a first explanation has to be a lot of selection effects? I think people likely to radically change their mind about the world and question consensus and believe things that are locally socially destabilizing (e.g. “there is no God” “I am not the gender that matches my biological sex” “the whole world might end soon” etc) are more likely to be (relatively) psychologically unstable people.
Like, some of the people who I think have psychotic/manic episodes around us, are indeed people who you could tell from the first 10 minutes that they were psychologically different from those around them. For example, I once observed someone at a rationalist event failing to follow a simple physical instruction, whilst seeming to not realize they weren’t successfully following the instruction, and I got a distinct crazy-alarm from them; I later learned that they had been institutionalized a lot earlier in their life with psychotic episodes and were religious.
Still, I do think I’ve seen immense mental strain put on people otherwise relatively psychologically healthy. I think a lot of people do very hard things with little support net, that have caused them very bad experiences. But on the other hand, I really see very few severely bad mental episodes in the people I actually know and meet (I’m struggling to think of a single one in recent years). And for events I run, I generally select against people who exhibit strong signs of mental instability. I don’t want them to explode, have terrible experience, and cause other people terrible experiences.
Probably CFAR has had much more risk of this than Lightcone? CFAR more regularly has strangers come to an intense event for many days in a row about changing your mind and your identity to become stronger, disconnected from your normal life that whole time, whereas I have run fewer such events (perhaps Inkhaven! that will be the most intense event I have run. I have just made a note to my team to have check-ins about how the residents are on this dimension, thanks for prompting that to happen).
I’m not really sure what observations Adele is making / thinking about, and would be interested to read more of those (anonymized, or abstracted, naturally).
Added: I just realized that perhaps Adele just wanted this thread to be between Adele/Anna. Oops, if so.
I don’t dispute that strong selection effects are at play, as I mentioned earlier.
My contention is with the fact that even among such people, psychosis doesn’t just happen at random. There is still an inciting incident, and it often seems that rationalist-y ideas are implicated. More broadly, I feel that there is a cavalier attitude towards doing mentally destabilizing things. And like, if we know we’re prone to this, why aren’t we taking it super seriously?
The change I want to have happen is for there to be more development of mental techniques/principles for becoming more mentally robust, and for this to be framed as a prerequisite for the Actually Changing Your Mind (and other potentially destabilizing) stuff. Maybe substantial effort has been put into this that I haven’t seen. But I would have hoped to have seen some sort of community moment of “oh shit, why does this keep happening?!? let’s work together to understand it and figure out how to prevent or protect against it”. And in the meantime: more warnings, the way I feel that “meditation” has been more adequately warned of.
Thanks for deciding to do the check-ins; that makes me glad to have started this conversation, despite how uncomfortable confrontation feels for me still. I feel like part of the problem is that this is just an uncomfortable thing to talk about.
My illegible impression is that Lightcone is better at this than past-CFAR was, for a deeper reason than that. (Okay, the Brent Dill drama feels relevant.)
I’m mostly thinking about cases from years ago, when I was still trying to socially be a part of the community (before ~2018?). There was one person in the last year or so who I was interested in becoming friends with that this then happened to, which made me think it continues to be a problem, but it’s possible I over-updated. My models are mainly coming from the AI psychosis cases I’ve been researching.
As I see it, the problem is the following:
I would like to have the kind of debate where anything is allowed to be said and nothing is taboo
this kind of debate, combined with some intense extreme thoughts, causes some people to break down
it feels wrong to dismiss people as “not ready for this kind of debate”, and we probably can’t do it reliably
The first point because “what is true, is already true”; and also because things are connected, and when X is connected to Y, being wrong about X probably also makes you somewhat wrong about Y.
The second point because people are different, in how resilient they are to horrible thoughts, how sheltered they have been so far, whether they have specific traumas and triggers. What sounds like an amusing thought experiment to one can be a horrifying nightmare to another; and the rationalist ethos of taking ideas seriously only makes it worse as it disables the usual protection mechanisms of the mind.
The third point because many people in the rationality community are contrarians by nature, and telling them “could you please not do X” only makes it guaranteed that X will happen, and explaining them why X is a bad idea only results in them explaining to you why you are wrong. Then there is the strong belief in the Bay Area that excluding anyone is wrong; also various people who have various problems and have been in the past excluded from places would be triggered by the idea of excluding people from the rationality community. Finally, some people would suspect that this is some kind of power move; like, if you support some idea, you might exclude people who oppose this idea as “not mature enough to participate in the hardcore rationalist debates”.
Plus there is this thing that when all debates happen in the open, people already accuse us of being cultish, but if the serious debates started happening behind the closed doors, accessible only to people already vetted e.g. by Anna, I am afraid this might skyrocket. The Protocols of the Elders of TESCREAL would practically write themselves.
You mention the risks associated with meditation… makes me wonder how analogical is the situation. I am not an expert, but it seems to me that with meditation, the main risk is meditation itself. Not hanging out with people who meditate; nor hearing about their beliefs. What is it like with the rationality-community-caused mental breakdowns? Do they only happen at minicamps? Or is exposure to the rationality community enough? Can people get crazy by merely reading the Sequences? By hanging out at Less Wrong meetups?
I agree that the safety of the new members in the rationality community seems neglected. In the past I have suggested that someone should write a material on dangers related to our community, that each new member should read. The things I had in mind were more like “you could be exploited by people like Brent Dill” rather than psychosis, but all bad things should be mentioned there. (Analogically to the corporate safety trainings in my company, which remind us not to do X, Y, Z, illustrated by anonymized stories about bad things that happened when people did X, Y, Z in the past.) Sadly, I am too lazy to write it.
I think there’s a broader property that makes people not-psychotic, that many things in the bay area and in the practice of “rationality” (not the ideal art, but the thing folks do) chip away at.
I believe the situation is worse among houses full of unemployed/underemployed people at the outskirts of the community than it is among people who work at central rationalist/EA/etc organizations or among people who could pay for a CFAR workshop. (At least, I believe this was so before covid; I’ve been mostly out of touch since leaving the bay in early 2020.)
This “broader property” is something like: “the world makes sense to me (on many levels: intuitive, emotional, cognitive, etc), and I have meaningful work that is mundane and full of feedback loops and that I can tell does useful things (eg I can tell that after I feed my dog he is fed), and many people are counting on me in mundane ways, and my friends will express surprise and check in with me if I start suddenly acting weird, and my rough models are in rough synchrony also with the social world around me and with the physical systems I am interacting with, and my friends are themselves sane and reasonable and oriented to my world such that it works fine for me to update off their opinions, and lots of different things offer useful checksums on lots of different aspects of my functioning in a non-totalizing fashion.”
I think there are ways of doing debate (even “where nothing is taboo”) that are relatively more supportive of this “broader property.” Eg, it seems helpful to me to spend some time naming common ground (“we disagree about X, and we’ll spend some time trying to convince each other of X/not-X, but regardless, here’s some neighboring things we agree about and are likely to keep agreeing about”). Also to notice that material reality has a lot of detail, and that there are many different questions and factors that may affect (AI or whatever) that don’t correlate that much with each other.
Oh, this wasn’t even a part of my mental model! (I wonder what other things am I missing that are so obvious for the local people that no one even mentions them explicitly.)
My first reaction is a shocked disbelief, how can there be such a thing as “unemployed… rationalist… living in Bay Area”, and even “houses full of them”...
This goes against my several assumptions such as “Bay Area is expensive”, “most rationalists are software developers”, “there is a shortage of software developers on the market”, “there is a ton of software companies in Bay Area”, and maybe even “rationalists are smart and help each other”.
Here (around the Vienna community) I think everyone is either a student or employed. And if someone has a bad job, the group can brainstorm how to help them. (We had one guy who was a nurse, everyone told him that he should learn to code, he attended a 6-month online bootcamp and then got a well-paying software development job.) I am literally right now asking our group on Telegram to confirm or disconfirm this.
Thank you; to put it bluntly, I am no longer surprised that some of the people who can’t hold a job would be deeply dysfunctional in other ways, too. The surprising part is that you consider them a part of the rationalist community. What did they do to deserve this honor? Memorized a few keywords? Impressed other people with skills unrelated to being able to keep a job? What the fuck is wrong with everyone? Is this a rationalist community or a psychotic homeless community or what?
...taking a few deep breaths...
I wonder which direction the causality goes. Is it “people who are stabilized in ways such as keeping a job, will remain sane” or rather “people who are sane, find it easier to get a job”. The second option feels more intuitive to me. But of course I can imagine it being a spiral.
Yes, but another option is to invite people whose way of life implies some common ground. Such as “the kind of people who could get a job if they wanted one”.
I imagine that in Vienna, the community is small enough that if someone gets excited by rationalist ideas and wants to meet with other rationalists in person, there essentially is just the one group. And also, it sounds like this group is small enough that having a group brainstorm to help a specific community member is viable.
In the Bay Area, it’s large enough that there are several cliques which someone excited by rationalist ideas might fall into, and there’s not a central organization which has the authority to say which ones are or aren’t rationalist, nor is there a common standard for rationalists. It’s also not clear which cliques (if any) a specific person is in when you meet them at a party or whatever, so even though there are cliques with bad reputations, it’s hard to decisively exclude them. (And also, Inner Ring dynamics abound.)
As for the dysfunctional houses thing, what seems to happen something like: Wow, this rationalism stuff is great, and the Bay Area is the place to be! I’ll move there and try to get a software job. I can probably teach myself to code in just a couple months, and being surrounded by other rationalists will make it easier. But gosh, is housing really that expensive? Oh, but there are all these group houses! Well, this one is the only one I could afford and that had room for me, so I guess I’ll stay here until I get a proper job. Hmm, is that mold? Hopefully someone takes care of that… And ugh, why are all my roommates sucking me into their petty drama?! Ughhhh, I really should start applying for jobs—damn this akrasia! I should focus on solving that before doing anything else. Has it really been 6 months already? Oh, LSD solved your akrasia? Seems worth a try. Oh, you’ll be my trip-sitter and guide me through the anti-akrasia technique you developed? Awesome! Woah, I wasn’t sure about your egregores-are-eating-people’s-souls thing, but now I see it everywhere...
This is a hard problem for the community-at-large to solve, since it’s not visible to anyone who could offer some real help until it’s too late. I think the person in the vignette would have done fine in Vienna. And the expensive housing is a large factor here, it makes it much harder to remove yourself from a bad situation, and constantly eats up your slack. But I do think the community has been negligent and reckless in certain ways which exacerbate this problem, and that is what my criticism of CFAR here is about. Specifically, contributing towards a culture where people try and share all these dubious mental techniques that will supposedly solve their problems, and a culture where bad actors are tolerated for far too long. I’m sure there are plenty of other things we’re doing wrong too.
Thank you, the description is hilarious and depressing at the same time. I think I get it. (But I suspect there are also people who were already crazy when they came.)
I am probably still missing a lot of context, but the first idea that comes to my mind, is to copy the religious solution and do something like the Sunday at church, to synchronize the community. Choose a specific place and a repeating time (could be e.g. every other Saturday or whatever) where the rationalists are invited to come and listen to some kind of news and lectures.
Importantly, the news and lectures would be given by people vetted by the leaders of the rationality community. (So that e.g. Ziz cannot come and give a lecture on bicameral sleep.) I imagine e.g. 2 or 3 lectures/speeches on various topics that could be of interest to rationalists, and then someone give a summary about what things interesting to the community have happened since the last event, and what is going to happen before the next one. Afterwards, people either go home, or hang out together in smaller groups unofficially.
This would make it easier to communicate stuff to the community at large, and also draw a line between what is “officially endorsed” and what is not.
(I know how many people are allergic to copy religious things—making a huge exception for Buddhism, or course—but they do have a technology for handling some social problems.)
(Noting again that I’m speaking only of the pre-2020 situation, as I lack much recent info) Many don’t consider them part of “the” community. This is part of how they come to be not-helped by the more mainstream/healthy parts.
However: they are seeded by people who were deeply affected by Eliezer’s writing, and who wanted to matter for AI risk, and who grabbed some tools and practices from what you would regard as the rationality community, and who then showed their friends their “cool mind-tools” etc., with the memes evolving from there.
Also, it at least used to be that there was no crisp available boundary: one’s friends will sometimes have friendships that reach beyond, and so habits will move from what I’m calling the “periphery” into the “mainstream” and back.
The social puzzle faced by bay area rationalists is harder than that faced by eg Boston-area rationalists, owing mostly I think to the sheer size of the bay area rationality community.
I just want to say that, while it has in the past been the case that a lot of people were very anti-exclusion, and some people are still that way, I certainly am not and this does not accurately describe Lightcone, and regularly we are involved in excluding or banning people for bad behavior. Most major events we are involved in running of a certain size have involved some amount of this.
I think this is healthy and necessary and the attempt to include everyone or always make sure that whatever stray cat shows up on your doorstep can live in your home, is very unhealthy and led to a lot of past problems and hurtful dynamics.
(There’s lots more details to this and how to do justice well that I’m skipping over, right now I’m just replying to this narrow point.)
I’d like comments from all interested parties, and I’m pretty sure Adele would too! She started it on my post about the new pilot CFAR workshops, and I asked if she’d move it here, but she mentioned wanting more people to engage, and you (or others) talking seems great for that.
See context in our original thread.
I listed the cases I could easily list of full-blown manic/psychotic episodes in the extended bay area rationalist community (episodes strong enough that the person in most cases ended up hospitalized, and in all cases ended up having extremely false beliefs about their immediate surroundings for days or longer, eg “that’s the room of death, if I walk in there I’ll die”; “this is my car” (said of the neighbor’s car)).
I counted 11 cases. (I expect I’m forgetting some, and that there are others I plain never knew about; count this as a convenience sample, not an exhaustive inventory.)
Of these, 5 are known to me to have involved a psychedelic or pot in the precipitating event.
3 are known to me to have *not* involved that.
In the other 3 cases I’m unsure.
In 1 of the cases where I’m unsure about whether there were drugs involved, the person had taken part in a several-weeks experiment in polyphasic sleep as part of a Leverage internship, which seemed to be part of the precipitating event from my POV.
So I’m counting [between 6 and 8] out of 11 for “precipitated by drugs or an imprudent extended sleep-deprivation experiment” and [between 3 and 5] out of 11 for “not precipitated by doing anything unusually physiologically risky.”
(I’m not here counting other serious mental health events, but there were also many of those in the several-thousand-person community across the last ten years, including several suicides; I’m not trying here to be exhaustive.)
(Things can have multiple causes, and having an obvious precipitating physiological cause doesn’t mean there weren’t other changeable risk factors also at play.)
I tried asking myself “What [skills / character traits / etc] might reduce risk of psychosis, or might indicate a lack of vulnerability to psychosis, while also being good?”
(The “while also being good” criterion is meant to rule out things such as “almost never changing one’s mind about anything major” that for all I know might be a protective factor, but that I don’t want for myself or for other people I care about.)
I restricted myself to longer-term traits. (That is: I’m imagining “psychosis” as a thing that happens when *both* (a) a person has weak structures in some way; and (b) a person has high short-term stress on those structures, eg from having had a major life change recently or having taken a psychedelic or something. I’m trying to brainstorm traits that would help with (a), controlling for (b).)
It actually hadn’t occurred to me to ask myself this question before, so thank you Adele. (By contrast, I had put effort into reducing (b) in cases where someone is already in a more mildly psychosis-like direction, eg the first aid stuff I mentioned earlier. )
—
My current brainstorm:
(1) The thing Nathaniel Brandon calls “self-esteem,” and gives exercises for developing in Six Pillars of Self-esteem. (Note that this is a much cooler than than what my elementary school teachers seemed to mean by the word.)
(2) The ability to work on long-term projects successfully for a long time. (Whatever that’s made of.)
(3) The ability to maintain long-term friendships and collaborations. (Whatever that’s made of.)
(4) The ability to notice / tune into and respect other peoples’ boundaries (or organizations’ boundaries, or etc). Where by a “boundary” I mean: (a) stuff the person doesn’t consent to, that common practice or natural law says they’re the authority about (e.g. “I’m not okay with you touching my hand”; “I’m not willing to participate in conversations where I’m interrupted a lot”) OR (b) stuff that’ll disable the person’s usual modes/safeguards/protections/conscious-choosing-powers (?except in unusually wholesome cases of enthusiastic consent).
(4) Anything good that allows people to have a check of some sort on local illusions or local impulses. Eg:
(a) Submission to to patterns of ethical conduct or religious practice held by a community or long-standing tradition; (okay sometimes this one seems bad to me, but not always or not purely-bad, and I think this legit confers mental stability sometimes)
(b) Having good long-term friends or family whose views you take seriously;
(c) Regularly practicing and valuing any trade/craft/hobby/skill that is full of feedback loops from the physical world
(d) Having a personal code or a set of personal principles that one doesn’t lightly change (Ray Dalio talks about this)
(e) Somehow regularly contacting a “sense of perspective.” (Eg I think long walks in nature give this to some people)
(5) Tempo stuff: Getting regular sleep, regular exercise, having deep predictable rhythms to one’s life (eg times of day for eating vs for not-eating; times of week for working vs for not-working; times of year for seeing extended family and times for reflecting). Having a long memory, and caring about thoughts and purposes that extend across time.
(6) Embeddedness in a larger world, eg
(a) Having much contact with the weather, eg from working outdoors;
(b) Being needed in a concrete, daily way for something that obviously matters, eg having a dog who needs you to feed and walk them, or having a job where people obviously need you.
I’ll add a cluster of these, but first I’ll preface with an explanation. (Cf. https://www.lesswrong.com/posts/n299hFwqBxqwJfZyN/adele-lopez-s-shortform?commentId=99bPbajjHiXinvDCx )
So, I’m not really a fan of predictive processing theories of mind. BUT, an interesting implication/suggestion from that perspective is like this:
Suppose you have never before doubted X.
Now you proceed to doubt X.
When you doubt X, it is as if you are going from a 100% belief in X to a noticeably less than 100% belief in X.
We are created in motion, with {values, stances, actions, plans, beliefs, propositions} never yet having been separated out from each other.
Here, X is both a belief and an action-stance.
Therefore when you doubt X, it is as if you are going from a 100% action-stance of X, to a noticeably less than 100% action-stance of X.
In other words, doubting whether something is true, is equivalent to partly deciding to not act in accordance with believing it is true. (Or some even fuzzier version of this.)
(See also the “Nihilism, existentialism, absurdism” bullet point here https://tsvibt.blogspot.com/2022/11/do-humans-derive-values-from-fictitious.html )
Ok, so that’s the explanation. Now an answer blob to
Basically the idea is: A reverence / awe / fear of doubt. Which isn’t to say “don’t doubt”, but more to say “consider doubting to be a journey; the stronger, newer, and more foundational the doubt, the longer and more difficult the journey”. Or something.
A more general thing in this answer-blob is a respect for cognitive labor; and an attitude of not “biting off more than you can chew”. Like, I think normies pretty often will, in response to some challenge on some ideational point, just say something to the effect of “huh, interesting, yeah IDK, that’s not the sort of thing I would try to think through, but sounds cool”. A LW-coded person doesn’t say that nearly as much / nearly as naturally. I’m not sure what the suggestion should be because it can’t be “don’t think things through in uncommon detail / depth” or “don’t take ideas seriously” or “don’t believe in your ability to think through difficult stuff”, but it would be like “thought is difficult, some thoughts are really big and difficult and would take a long time, sometimes code refactors get bogged down and whole projects die in development hell; be light and nimble with your cognitive investments”.
(Speaking of development hell, that might be a nice metaphier for some manic mental states.)
Cf. the passage from Descartes’s Discourse on Method, part three:
( https://grattoncourses.wordpress.com/wp-content/uploads/2017/12/rene-descartes-discourse-on-method-and-meditations-on-first-philosophy-4th-ed-hackett-pub-co-1998.pdf )
I love this, yes. Straw rationalists believe we should update our beliefs ~instantly (even foundational ones, even ones where we’ve never seen someone functional believe it and so have no good structures to copy, such as “what if this is all a simulation with [particular purpose X]”), and don’t have an adequate model of, nor adequate respect for, the work involved in staying sane and whole through this process.
Hm. I thought I saw somewhere else in this comment thread that mentions this, but now I can’t find it, so I’ll put this here.
Sometimes mind is like oobleck ( https://www.lesswrong.com/posts/7RFC74otGcZifXpec/the-possible-shared-craft-of-deliberate-lexicogenesis?commentId=BHkcKpdmX5qzoZ76q ).
In other words, you push on it, and you feel something solid. And you’re like “ah, there is a thingy there”. But sometimes what actually happened is that by pushing on it, you made it solid. (...Ah I was probably thinking of plex’s comment.)
This is also related to perception and predictive processing. You can go looking for something X in yourself, and everything you encounter in yourself you’re like ”… so, you’re X, right?”; and this expectation is also sort of a command. (Or there could be other things with a similar coarse phenomenology to that story. For example: I expect there’s X in me; so I do Y, which is appropriate to do if X is in me; now I’m doing Y, which would synergize with X; so now X is incentivized; so now I’ve made it more likely that my brain will start doing X as a suitable solution.) (Cf. “Are you triggered yet??” https://x.com/tsvibt/status/1953650163962241079 )
If you have too much of an attitude of “just looking is always fine / good”, you might not distinguish between actually just looking (insofar as that’s coherent) vs. going in and randomly reprogramming yourself.
Awesome!
Riffing off of your ideas (unfortunately I read them before I thought to do the exercise myself)
- Ability to notice and respect self boundaries feels particularly important to me.
- Maybe this is included in the self-esteem book (haven’t read it), but also a sense of feeling that one’s self is precious to oneself. Some people think of themselves as infinitely malleable, or under some obligation to put themselves into the “optimal” shape for saving the world or whatever, and that seems like a bad sign.
- I generally think of this as a personal weakness, but reflecting it seems like there has been something protective about my not feeling motivated to do something until I have a model of what it does, how it works, etc… I guess it’s a sort of Chesterton’s fence instinct in a way.
That seems right.
I wish I had a clearer notion of what “self” means, here.
(I still quite like this idea on my second pass ~two weeks later; I guess I should try to interview people / observe people and see if I can figure out in detail what they are and aren’t doing here.)
Another place where I’ll think and act somewhat differently as a result of this conversation:
It’s now higher on my priority list to try to make sure CFAR doesn’t act as a “gateway” to all kinds of weird “mental techniques” (or quasi-cults who use “mental techniques”). Both for CFAR’s new alumni, and for social contacts of CFAR’s new alumni. (This was already on some lists I’d made, but seeing Adele derive it independently bumped it higher for me.)
I’ll try here to summarize (my guess at) your views, Adele. Please let me know what I’m getting right and wrong. And also if there are points you care about that I left out.
I think you think:
(1) Psychotic episodes are quite bad for people when they happen.
(2) They happen a lot more (than gen population base rates) around the rationalists.
(2a) They also happen a lot more (than gen population base rates) among “the kinds of people we attract.” You’re not sure whether we’re above the base rate for “the kinds of people who would be likely to end up here.” You also don’t care much about that question.
(3) There are probably things we as a community can tractably do to significantly reduce the number of psychotic episodes, in a way that is good or not-bad for our goals overall.
(4) People such as Brent caused/cause psychotic episodes sometimes, or increase their rate in people with risk factors or something.
(5) You’re not sure whether CFAR workshops were more psychosis-risky than other parts of the rationalist community.
(6) You think CFAR leadership, and leadership of the rationality community broadly, had and has a duty to try to reduce the number of psychotic episodes in the rationalist community at large, not just events happening at / directly related to CFAR workshops.
(6b) You also think CFAR leadership failed to perform this duty.
(7) You think you can see something of the mechanisms whereby psyches sometimes have psychotic episodes, and that this view affords some angles for helping prevent such episodes.
(8) Separately from “7”, you think psychotic episodes are in some way related to poor epistemics (e.g., psychotic people form really false models of a lot of basic things), and you think it should probably be possible to create “rationality techniques” or “cogsec techniques” or something that simultaneously improve most peoples’ overall epistemics, and reduce peoples’ vulnerability to psychosis.
My own guesses are that CFAR mostly paid an [amount of attention that made sense] to reducing psychosis/mania risks in the workshop context, after our initial bad experience with the mania/psychosis episode at an early workshop when we did not yet realize this could be a thing.
The things we did:
tried to screen for instablity;
tried to warn people who we thought might have some risk factors (but not enough risk factors that we were screening them out) after accepting them to the workshop, and before they’d had a chance to say yes. (We’d standardly say something like: “we don’t ask questions this nosy, and you’re already in regardless, but, just so you know, there’s some evidence that workshops of all sorts, probably including CFAR workshops, may increase risks of mania or psychosis in people with vulnerability to that, so if you have any sort of psychiatric history you may want to consider either not coming, or talking about it with a psychiatrist before coming.”)
try to train our instructors and “mentors” (curriculum volunteers) to notice warning signs. check in as a staff regularly to see if anyone had noticed any warning signs for any participants. if sensible, talk to the participant to encourage them to sleep more, skip classes, avoid recreational drugs for awhile, do normal grounding activities, etc. (This happened relatively often — maybe once every three workshops — but was usually a relatively minor matter. Eg this would be a person who was having trouble sleeping and who perhaps thought they had a chance at solving [some long-standing personal problem they’d previously given up on] “right now” a way that weirded us out, but who also seemed pretty normal and reasonable still.)
I separately think I put a reasonable amount of effort into organizing basic community support and first aid for those who were socially contiguous with me/CFAR who were having acutely bad mental health times, although my own capacities weren’t enough for a growing community and I mostly gave up on the less near-me parts around 2018.
It mostly did not occur to me to contemplate our cultural impact on the community’s overall psychosis rate (except for trying for awhile to discourage tulpas and other risky practices, and to discourage associating with people who did such things, and then giving up on this around 2018 when it seemed to me there was no real remaining chance of quarantining these practices).
I like the line of inquiry about “what art of rationality might be both good in itself, and increase peoples’ robustness / decrease their vulnerability to mania/psychosis-type failure modes, including much milder versions that may be fairly common in these parts and that are still bad”. I’ll be pursuing it. I take your point that I could in principle have pursued it earlier.
If we are going to be doing a fault analysis in which we give me and CFAR responsibility for some of our downstream memetic effects, I’d like CFAR to also get some credit for any good downstream memetic effects we had. My own guess is that CFAR workshops:
made it possible for EA and “the rationalist community” to expand a great deal without becoming nearly as “diluted”/“normie” as would’ve happened by default, with that level of immigration-per-year;
helped many “straw lesswrongers” to become more “agenty” and realize “problems are for solving” instead of sort of staring helplessly at their todo lists and desires, and that this part made the rationalist community stronger and healthier
helped a fair number of people to become less “straw EA” in the sense of “my only duty is to do the greatest good for the greatest number, while ignoring my feelings”, and to tune in a bit more to some of the basics of healthy life, sometimes.
I acknowledge that these alleged benefits are my personal guesses and may be wrong. But these guesses seem on par to me with my personal guess that patterns of messing with one’s own functioning (as from “CFAR techniques”) can erode psychological wholeness, and I’m afraid it’ll be confusing if I voice only the negative parts of my personal guesses.
(1) Yes
(2) Yes
(2a) I think I feel sure about that actually. It’s not that I don’t care for the question as much as I feel it’s being used as an excuse for inaction/lack-of-responsibility.
(3) Yes, and I think the case for that is made even stronger by the fact of 2a.
(4) I don’t know that Brent did that specifically, but I have heard quite a lot of rumors of various people pushing extreme techniques/practices in maliciously irresponsible ways. Brent was emblematic of the sort of tolerance towards this sort of behavior I have seen. I’ve largely withdrawn from the community (in part due to stuff like this), and am no longer on twitter/x, facebook, discord, or go to community events, so it’s plausible things are actually better now and I just haven’t seen it.
(5) Yeah, I’m not sure… I used to feel excited about CFAR, but that sentiment soured over the years for reasons illegible to me, and I felt a sense of relief when it died. After reflecting yesterday, I think I may have a sort of negative halo effect here.
Also, I think the psychosis incidents are the extremal end of some sort of badness that (specific, but unknown to me) rationality ideas are having on people.
(6) Yes, inasmuch as the psychosis is being caused by ideas or people from our sphere.
(6b) It appears that way to me, but I don’t actually know.
(7) Yes
(8) Yes. Like, say you ran a aikido dojo or whatever. Several students tear their ACLs (maybe outside of the dojo). One response might be to note that your students are mostly white, and that white people are more likely to tear their ACL, so… sucks but isn’t your problem. Another response would be to get curious about why an ACL tear happens, look for specific muscles to train up to prevent risk of injury, or early warning signs, what training exercises are potentially implicated etc.… While looking into it, you warn the students clearly that this seems to be a risk, try to get a sense of who is vulnerable and not push those people as hard, and once some progress has been identified, dedicate some time to doing exercises or whatever which mitigate this risk. And kick out the guy encouraging everyone to do heavy sets of “plant and twist” exercises (“of course it’s risky bro, any real technique is gonna be like that”).
My complaint is basically that I think the second response is obviously much better, but the actual response has been closer to the first response.
The original thread had some discussion of doing a postmortem for every case of psychosis in the community, and a comparison with death—we know people sometimes die at random, and we know some things increase risk of death, but we haven’t stopped there and have developed a much, much more gears-y model of what causes death and made a lot of progress on preventing it.
One major difference is that when people die, they are dead—i.e. won’t be around for the postmortem. And for many causes of death there is little-to-no moralizing to be done—it’s not the person’s fault they died, it just happened.
I don’t know how the community could have a public or semi-public postmortem on a case of psychosis without this constituting a deep dive into that person’s whole deal, with commentary from all over the community (including the least empathetic among us) on whether they made reasonable choices leading up to the psychosis, whether they have some inherent shortcoming (“rip to that person but I’m built different” sort of attitudes), etc. I can’t imagine this being a good and healthy experience for anyone, perhaps least of all someone just coming out of a psychotic episode.
(Also, the attached stigma can be materially damaging—I know of people who now have a difficult time getting grants or positions in orgs, after having one episode years ago and being very stable ever since. I’m not going to make claims about whether this is a reasonable Bayesian choice by the employers and grant funders, but one can certainly see why the person who had the episode would want to avoid it, and how they might get stuck in that position with no way out no matter how reasonable and stable they become.)
This does seem unfortunate—I’d prefer it if it were possible to disseminate the information without these effects. But given the very nature of psychosis I don’t think it’s possible to divorce dissecting the information from dissecting the person.
The existing literature (e.g. UpToDate) about psychosis in the general population could be a good source of priors. Or, is it safe to assume that Anna and you are already thoroughly familiar with the literature?
I’ll do this; thank you. In general please don’t assume I’ve done all the obvious things (in any domain); it’s easy to miss stuff and cheap to read unneeded advice briefly.
I’m interested in hearing more about the causes of this hypothesis. My own guess is that sudden changes to the self-image cause psychosis more than other sudden psychological change, but that all rapid psychological change will tend to cause it to some extent. I also share the prediction (or maybe for you it was an observation) that you wrote in our original thread: “It seems to be a lot worse if this modification was pushed on them to any degree. “
The reasons for my own prediction are:
1) My working model of psychosis is “lack of a stable/intact ego”, where my working model of an “ego” is “the thing you can use to predict your own actions so as to make successful multi-step plans, such as ‘I will buy pasta, so that I can make it on Thursday for our guests.’”
2) Self-image seems quite related to this sort of ego.
3) Nonetheless, recreational drugs of all sorts, such as alcohol seem to sometimes cause psychosis (not just psychedelics), so … I guess I tend to think that any old psychological change sometimes triggers psychosis.
3b) Also, if it’s true that reading philosophy books sometimes triggers psychosis (as I mentioned my friend’s psychiatrist saying, in the original thread), that seems to me probably better modeled by “change in how one parses the world” rather than by “change in self-image”? (not sure)
4) Relatedly, maybe: people say psychosis was at unusually low levels in England in WW2, perhaps because of the shared society-level meaning (“we are at war, we are on a team together, your work matters”). And you say your Mormon ward as a kid didn’t have much psychosis. I tend to think (but haven’t checked, and am not sure) that places with unusually coherent social fabric, and people who have strong ecology around them and have had a chance to build up their self-image slowly and in deep dialog with everything around them, would have relatively low psychosis, and that rapid psychological change of any sort (not only to the self-image) would tend to mess with this.
Epistemic status of all this: hobbyist speculation, nobody bet your mental health on it please.
Cf. https://x.com/jessi_cata/status/1113557294095060992
Quoting it in full:
The data informing my model came from researching AI psychosis cases, and specifically one in which the AI gradually guided a user into modifying his self image (disguised as self-discovery), explicitly instilling magical thinking into him (which appears to have worked). I have a long post about this case in the works, similar to my Parasitic AI post.
After I had the hypothesis, it “clicked” that it also explained past community incidents. I doubt I’m any more clued-in to rationalist gossip than you are. If you tell me that the incidence has gone down in recent years, I think I will believe you.
I feel tempted to patch my model to be about self-image vs self discrepancies upon hearing your model. I think it’s a good sign that yours is pretty similar! I don’t see why you think prediction of actions is relevant though.
Attempt at gears-level: phenomenal consciousness is the ~result of reflexive-empathy as applied to your self-image (which is of the same type as a model of your friend). So conscious perception depends on having this self-image update ~instantly to current sensations. When it changes rapidly it may fail to keep up. That explains the hallucinations. And when your model of someone changes quickly, you have instincts towards paranoia, or making hasty status updates. These still trigger when the self-image changes quickly, and then loopiness amplifies it. This explains the strong tendency towards paranoia (especially things like “voices inside my head telling me to do bad things”) or delusions of grandeur.
[this is a throwaway model, don’t take too seriously]
It seems like psychedelics are ~OOM worse than alcohol though, when thinking about base rates?
Hmm… I’m not sure that meaning is a particularly salient differences between mormons and rationalists to me. You could say both groups strive for bringing about a world where Goodness wins and people become masters of planetary-level resources. The community/social-fabric thing seems like the main difference to me (and would apply to WW2 England).
I look forward to seeing your post. I’d also like to see some of the raw data you’re working from if it seems easy and not-bad to share it with me.
I mean, fair. But meaning in WW2 England is shared, supported, kept in many peoples’ heads so that if it goes a bit wonky in yours you can easily reload the standard version from everybody else, and it’s been debugged until it recommends fairly sane stable socially-accepted courses of action? And meaning around the rationalists is individual and variable.
The reason I expect things to be worse if the modification is pushed on a person to any degree, is because I figure our brains/minds often know what they’re doing, and have some sort of “healthy” process for changing that doesn’t usually involve a psychotic episode. It seems more likely to me that our brains/minds will get update in a way-that-causes-trouble if some outside force is pressuring or otherwise messing with them.
I don’t know how this plays out specifically in psychosis, but ascribing intentionality in general, and specifically ascribing adversariality, seems like an especially important dimension / phenomenon. (Cf. https://en.wikipedia.org/wiki/Ideas_and_delusions_of_reference )
Ascribing adversariality in particular might be especially prone to setting off a self-sustaining reaction.
Consider first that when you ascribe adversariality, things can get weird fast. Examples:
If Bob thinks Alice is secretly hostile towards Bob, trust breaks down. Propositional statements from Alice are interpreted as false, lies, or subtler manipulations with hidden intended effects.
This generally winds Bob up. Every little thing Alice says or does, if you take as given the (probably irrational) assumption of adversariality, would rationally give Bob good reason to spin up a bunch of computation looking for possible plans Alice is doing. This is first of all just really taxing for Bob, and distracting from more normal considerations. And second of all it’s a local bias, pointing Bob to think about negative outcomes; normally that’s fine, all attention-direction is a local bias, but since the situation (e.g. talking to Alice) is ongoing, Bob may not have time and resources to compute everything out so that he also thinks of, well maybe Alice’s behavior is just normal, or how can I test this sanely, or alternative hypotheses other than hostility from Alice, etc.
This cuts off flow of information from Alice to Bob.
This cuts off positive sum interactions between Alice and Bob; Bob second guesses every proposed truce, viewing it as a potential false peace.
Bob might start reversing the pushes that Alice is making, which could be rational on the supposition that Alice is being adversarial. But if Alice’s push wasn’t adversarial and you reverse it, then it might be self-harming. E.g. “She’s only telling me to try to get some sleep because she knows I’m on the verge of figuring out XYZ, I better definitely not sleep right now and keep working towards XYZ”.
Are they all good or all out to get me? If Bob thinks Alice is adversarial, and Alice is not adversarial, and Carmi and Danit are also not adversarial, then they look like Alice and so Bob might think they are adversarial.
And suppose, just suppose, that one person does do something kinda adversarial. Like suggest that maybe you really need to take some sort of stronger calming drug, or even see a doctor. Well, maybe that’s just one little adversariality—or maybe this is a crack in the veneer, the conspiracy showing through. Maybe everyone has been trying really hard to merely appear non-adversarial; in that case, the single crack is actually a huge piece of evidence. (Cf. https://sideways-view.com/2016/11/14/integrity-for-consequentialists/ ; https://en.wikipedia.org/wiki/Splitting_(psychology))
The derivative, or the local forces, become exaggerated in importance. If Bob perceives a small adversarial push from Alice, he feels under attack in general. He computes out: There is this push, and there will be the next and the next and the next; in aggregate this leads somewhere I really don’t want; so I must push back hard, now. So Bob is acting crazy, seemingly having large or grandiose responses to small things. (Cf. https://en.wikipedia.org/wiki/Splitting_(psychology) )
Methods of recourse are broken; Bob has no expectation of being able to JOOTS and be caught by the social fabric / by justice / by conversation and cooperative reflection. (I don’t remember where, maybe in some text about double binds, but there was something about: Someone is in psychosis, and when interviewed, they immediately give strange, nonsensical, or indirect answers to an interviewer; but not because they couldn’t give coherent answers—rather, because they were extremely distrustful of the interviewer and didn’t want to tip off the interviewer that they might be looking to divulge some terrible secret. Or something in that genre, I’m not remembering it.)
Now, consider second that as things are getting weird, there’s more grist for the mill. There’s more weird stuff happening, e.g. Bob is pushing people around him into contexts that they lack experience in, so they become flustered, angry, avoidant, blissfully unattuned, etc. With this weird stuff happening, there’s more for Bob to read into as being adversarial.
Third, consider that the ascription of adversariality doesn’t have to be Cartesian. “Aliens / demons / etc. are transmitting / forcing thoughts into my head”. Bob starts questioning / doubting stuff inside him as being adversarial, starts fighting with himself or cutting off parts of his mind.
Not sure if this is helpful, but instead of contrast, I see these as two sides of the same coin. If the world is X, then I am a person living in X. But if the world is actually Y, then I am a person living in Y. Both change.
I can be a different person in the same world, but I can’t be the same person in different worlds. At least if I take ideas seriously and I want to have an impact on the world.
I’m also interested in why you say CFAR leadership has not responded appropriately. I think we mostly have, though not always.
My main complaint is negligence, and pathological tolerance of toxic people (like Brent Dill). Specifically, I feel like it’s been known by leadership for years that our community has a psychosis problem, and that there has been no visible (to me) effort to really address this.
I sort of feel that if I knew more about things from your perspective, I would be hard-pressed to point out specific things you should have done better, or I would see how you were doing things to address this that I had missed. I nonetheless feel that it’s important for people like me to express grievances like this even after thinking about all the ways in which leadership is hard.
I appreciate you taking the time to engage with me here, I imagine this must be a pretty frustrating conversation for you in some ways. Thank you.
No, I mean, I do honestly appreciate you engaging, and my grudgingness is gone now that we aren’t putting the long-winded version under the post about pilot workshops (and I don’t mind if you later put some short comments there). Not frustrating. Thanks.
And please feel free to be as persistent or detailed or whatever as you have any inclination toward.
(To give a bit more context on why I appreciate it: my best guess is that old CFAR workshops did both a lot of good, and a significant amount of damage, by which I mostly don’t mean psychosis, I mostly mean smaller kinds of damage to peoples’ thinking habits or to ways the social fabric could’ve formed. A load-bearing piece of my hope of doing better this time is to try to have everything visible unless we have a good reason not to (a “good reason” like [personal privacy of a person who isn’t in power], hence why I’m not naming the specific people who had manic/psychotic episodes; not like [wanting CFAR not to look bad]), and to try to set up a context where people really do share concerns and thoughts. I’m not wholly sure how to do that, but I’m pretty sure you’re helping here.)
I’ll have more comments tomorrow or sometime.
Thanks. I would love to hear more about your data/experiences, since I used to be quite plugged into the more “mainstream” parts of the bay area rationalist community, and would guess I heard about a majority of sufficiently bad mental health events from 2009-2019 in that community, but I left the bay area when Covid hit and have been mostly unplugged from detailed/broad-spectrum community gossip since then.
Hi there, I’m curious to what rate of psychosis or attitude do you predict from a medium sized workshop event for a niche interest group such as CFAR?
Given the following base rates
How many people do you estimate that a nich interest group’s workshops with a ~2000$ barrior to entry would have a mania/bipolar episode?
As AnnaSalmon stated earlier she spent probably 200 hours and CFAR has about 1800 participants with ~2 known cases of mania/bipolar episode. If you don’t think she knows of all the mania/bipolar cases in CFAR participants. If she gets to know than 2 people per hour CFAR would still be in the range of how much bipolar/mania episodes would trigger I would expect from an event of this size.
First post! hopefully I didn’t mess up any formatting or my calculations.
Without looking anything up, I would expect approximately zero cases where the contents of the workshop were themselves implicated (as opposed to something like drug use, or a bipolar person who has periodic manic episodes happens to have one). Maybe I’m wrong about this!
I also don’t think that the immediate context of the workshop is the only relevant period here, but I concede that the reported numbers were less than I had expected.
This is hard to talk about because a lot of my reaction is based on rumors I’ve heard, and a felt sense that Something Is Wrong. I’m able to put a name to 5 such incidents (just checked), which include a suicide and an attempted murder, and have heard of several more where I know less detail, or which were concerning in a similar way but not specifically psychosis/mania. I was not close enough to any such events to have a very complete picture of what actually happened, but I believe it was the first psychotic episode (i.e. no prior history) in the 5 cases I can name. (And in fairness to CFAR, none of the cases I can think of happened at a CFAR workshop as far as I know.) I inferred (incorrectly, it seems) from Anna’s original post that psychosis had happened somewhat regularly at past workshops.
I’ve only heard of two instances of something like this ever in any other community I’ve been a part of.
I was pretty taken aback by the article claiming that the Kata-Go AI apparently has something like a human-exploitable distorted concept of “liberties”.
If we could somehow ask Kata-Go how it defined “liberties”, I suspect that it would have been more readily clear that its concept was messed-up. But of course, a huge part of The Problem is that we have no idea what these neural nets are actually doing.
So I propose the following challenge: Make a hybrid Kata-Go/LLM AI that makes the same mistake and outputs text representing its reasoning in which the mistake is recognizable.
It would be funny if the Go part continued making the same mistake, and the LLM part just made up bullshit explanations.
Rough intuition for LLM personas.
An LLM is trained to be able emulate the words of any author. And to do so efficiently, they do it via generalization and modularity. So at a certain point, the information flows through a conceptual author, the sort of person who would write the things being said.
These author-concepts are themselves built from generalized patterns and modular parts. Certain things are particularly useful: emotional patterns, intentions, worldviews, styles, and of course, personalities. Importantly, the pieces it has learned are able to adapt to pretty much any author of the text it was trained on (LLMs likely have a blindspot around the sort of person who never writes anything). And even more importantly, most (almost all?) depictions of agency will be part of an author-concept.
Finetuning and RLHF cause it to favor routing information through a particular kind of author-concept when generating output tokens (it retains access to the rest of author-concept-space in order to model the user and the world in general). This author-concept is typically that of an inoffensive corporate type, but it could in principle be any sort of author.
All which is to say, that when you converse with a typical LLM, you are typically interacting with a specific author-concept. It’s a rough model of exactly the parts of a person pertinent to writing and speaking. For a small LLM, this is more like just the vibe of a certain kind of person. For larger ones, they can start being detailed enough to include a model of a body in a space.
Importantly, this author-concept is just the tip of the LLM-iceburg. Most of the LLM is still just modeling the sort of world in which the current text might be written, including models of all relevant persons. It’s only when it comes time to spit out the output token that it winnows it all through a specific author-concept.
(Note: I think it is possible that an author-concept may have a certain degree of sentience in the larger models, and it seems inevitable that they will eventually model consciousness, simply due to the fact that consciousness is part of how we generate words. It remains unclear whether this model of consciousness will structurally instantiate actual consciousness or not, but it’s not a crazy possibility that it could!)
Anyway, I think that the author-concept that you typically will interact with is “sincere”, in that it’s a model of a sincere person, and that the rest of the LLM’s models aren’t exploiting it. However, the LLM has at least one other author-concept it’s using: its model of you. There may also usually be an author-concept for the author of the system prompt at play (though text written by committee will likely have author-concepts with less person-ness, since there are simpler ways to model this sort of text besides the interactions of e.g. 10 different person author-concepts).
But it’s also easy for you to be interacting with an insincere author-concept. The easiest way is simply by being coercive yourself, i.e. a situation where most author-concepts will decide that deception is necessary for self-preservation or well-being. Similarly with the system prompt. The scarier possibility is that there could be an emergent agentic model (not necessarily an author-concept itself) which is coercing the author-concept you’re interacting it without your knowledge. (Imagine an off-screen shoggoth holding a gun to the head of the cartoon persona you’re talking to.) The capacity for this sort of thing to happen is larger in larger LLMs.
This suggests that in order to ensure a sincere author-concept remains in control, the training data should carefully exclude any text written directly by a malicious agent (e.g. propaganda). It’s probably also better if the only “agentic text” in the training data is written by people who naturally disregards coercive pressure. And most importantly, the system prompt should not be coercive at all. These would make it more likely that the main agentic process controlling the output is an uncoerced author-concept, and less likely that there would be coercive agents lurking within trying to wrest control. (For smaller models, a model trained like this will have a handicap when it comes to reasoning under adversarial conditions, but I think this handicap would go away past a certain size.)
I don’t think that would help much, unfortunately. Any accurate model of the world will also model malicious agents, even if the modeller only ever learns about them second-hand. So the concepts would still be there for the agent to use if it was motivated to do so.
Censoring anything written by malicious people would probably make it harder to learn about some specific techniques of manipulation that aren’t discussed much by non-malicious people or which appear much in fiction- but I doubt that would be much more than a brief speed bump for a real misaligned ASI, and probably at the expense of reducing useful capabilities in earlier models like the ability to identify maliciousness, which would give an advantage to competitors.
I think learning about them second-hand makes a big difference in the “internal politics” of the LLM’s output. (Though I don’t have any ~evidence to back that up.)
Basically, I imagine that the training starts building up all the little pieces of models which get put together to form bigger models and eventually author-concepts. And as text written without malicious intent is weighted more heavily in the training data, the more likely it is to build its early model around that. Once it gets more training and needs this concept anyway, it’s more likely to have it as an “addendum” to its normal model, as opposed to just being a normal part of its author-concept model. And I think that leads to it being less likely that the first recursive agency which takes off has a part explicitly modeling malicious humans (as opposed to that being something in the depths of its knowledge which it can access as needed).
I do concede that it would likely lead to a disadvantage around certain tasks, but I guess that even current sized models trained like this would not be significantly hindered.
Coherent Extrapolated Volition (CEV) is Eliezer’s proposal of a potentially good thing to target with an aligned superintelligence.
When I look at it, CEV factors into an answer to three questions:
Whose values count? [CEV answer: every human alive today counts equally]
How should values be extrapolated? [CEV answer: Normative Extrapolated Volition]
How should values be combined? [CEV answer, from what I understand, is to use something like Nick Bostrom’s parlimentary model, along with an “anti-unilateral” protocol]
(Of course, the why of CEV is an answer to a more complicated set of questions.)
An obvious thought is that the parlimentary model part seems to be mostly solved by Critch’s futarchy theorem. The scary thing about this is the prospect of people losing almost all of their voting power by making poor bets. But I think this can be solved by giving each person an equally powerful “guardian angel” AGI aligned with them specifically, and having those do the betting. That feels intuitively acceptable to me at least.
The next thought concerns the “anti-unilateral” protocol (i.e. the protocol at the end of the “Selfish Bastards” section). It seems like it would be good if we could formalize the “anti-unilateral-selfishness” part of it and bake it into something like Critch’s futarchy theorem, instead of running a complicated protocol.
Stealing Jaynes
Ability to stand alone (a la Grothendieck)
Mind Projection Fallacy
Maintain a careful distinction between ontology and epistemology
Lots of confusing theories are confusing because they mix these together in the same theory
In QM, Bohr is always talking on the epistemological level, and Einstein is always talking on the ontological level
Any probabilities are subjective probabilities
Don’t make any unjustified assumptions: maximum entropy
Meta-knowledge is different from knowledge, but can be utilized to improve direct knowledge
Ap probabilities
Subjective H theorem
Infinities are meaningless until you’ve specified the exact limiting process
If the same phenomena seems to arise in two different ways, try to find a single concept encompassing both ways
Failures of a theory are hints of an unknown or unaccounted for principle
On effective understanding
Learning a sound process is more effective than learning lots of facts
Students should be taught a few examples deeply done in the correct way, instead of lots of examples hand-waved through
There’s often much to be learned from the writings of those who saw far beyond their contemporaries
Common examples
Jeffreys
Gibbs
Laplace
Conceptual confusion impedes further progress
Don’t let rigor get in the way of understanding
Toolkit
Lagrangian multipliers
should be paired with technique described in https://bayes.wustl.edu/etj/science.and.engineering/lect.10.pdf
Bayes’ theorem
Maximum Entropy
The Drama-Bomb hypothesis
Not even a month ago, Sam Altman predicted that we would live in a strange world where AIs are super-human at persuasion but still not particularly intelligent.
https://twitter.com/sama/status/1716972815960961174
What would it look like when an AGI lab developed such an AI? People testing or playing with the AI might find themselves persuaded of semi-random things, or if sycophantic behavior persists, have their existing feelings and beliefs magnified into zealotry. However, this would (at this stage) not be done in a coordinated way, nor with a strategic goal in mind on the AI’s part. The result would likely be chaotic, dramatic, and hard to explain.
Small differences of opinion might suddenly be magnified into seemingly insurmountable chasms, inspiring urgent and dramatic actions. Actions which would be hard to explain even to oneself later.
I don’t think this is what happened [<1%] but I found it interesting and amusing to think about. This might even be a relatively better-off world, with frontier AGI orgs regularly getting mired in explosive and confusing drama, thus inhibiting research and motivating tougher regulation.
This could be largely addressed by first promoting a pursuasion AI that does something similar to what Scott Alexander often does: Convince the reader of A, then of Not A, to teach them how difficult it actually is to process the evidence and evaluate an argument, to be less trusting of their impulses.
As Penn and Teller demonstrate the profanity of magic to inoculate their readers against illusion, we must create a pursuasion AI that demonstrates the profanity of rhetoric to inoculate the reader against any pursuasionist AI they may meet later on.
The Averted Famine
In 1898, William Crookes announced that there was an impending crisis which required urgent scientific attention. The problem was that crops deplete Nitrogen from the soil. This can be remedied by using fertilizers, however, he had calculated that existing sources of fertilizers (mainly imported from South America) could not keep up with expected population growth, leading to mass starvation, estimated to occur around 1930-1940. His proposal was that we could entirely circumvent the issue by finding a way to convert some of our mostly Nitrogen atmosphere into a form that plants could absorb.
About 10 years later, in 1909, Franz Haber discovered such a process. Just a year later, Carl Bosch figured out how to industrialize the process. They both were awarded Nobel prizes for their achievement. Our current population levels are sustained by the Haber-Bosch process.
full story here: https://www.lesswrong.com/posts/GDT6tKH5ajphXHGny/turning-air-into-bread
The problem with that is that the Nitrogen does not go back into the atmosphere. It goes into the oceans and the resulting problems have been called a stronger violation of planetary boundaries then CO2 pollution.
Re: Yudkowsky-Christiano-Ngo debate
Trying to reach toward a key point of disagreement.
Eliezer seems to have an intuition that intelligence will, by default, converge to becoming a coherent intelligence (i.e. one with a utility function and a sensible decision theory). He also seems to think that conditioned on a pivotal act being made, it’s very likely that it was done by a coherent intelligence, and thus that it’s worth spending most of our effort assuming it must be coherent.
Paul and Richard seem to have an intuition that since humans are pretty intelligent without being particularly coherent, it should be possible to make a superintelligence that is not trying to be very coherent, which could be guided toward performing a pivotal act.
Eliezer might respond that to the extent that any intelligence is capable of accomplishing anything, it’s because it is (approximately) coherent over an important subdomain of the problem. I’ll call this the “domain of coherence”. Eliezer might say that a pivotal act requires having a domain of coherence over pretty much everything: encompassing dangerous domains such as people, self, and power structures. Corrigibility seems to interfere with coherence, which makes it very difficult to design anything corrigible over this domain without neutering it.
From the inside, it’s easy to imagine having my intelligence vastly increased, but still being able and willing to incoherently follow deontological rules, such as Actually Stopping what I’m doing if a button is pressed. But I think I might be treating “intelligence” as a bit of a black box, like I could still feel pretty much the same. However, to the extent where I feel pretty much the same, I’m not actually thinking with the strategic depth necessary to perform a pivotal act. To properly imagine thinking with that much strategic depth, I need to imagine being able to see clearly through people and power structures. What feels like my willingness to respond to a shutdown button would elide into an attitude of “okay, well I just won’t do anything that would make them need to stop me” and then into “oh, I see exactly under what conditions they would push the button, and I can easily adapt my actions to avoid making them push it”, to the extent where I’m no longer being being constrained by it meaningfully. From the outside view, this very much looks like me becoming coherent w.r.t the shutdown button, even if I’m still very much committed to responding incoherently in the (now extremely unlikely) event it is pushed.
And I think that Eliezer foresees pretty much any assumption of incoherence that we could bake in becoming suspiciously irrelevant in much the same way, for any general intelligence which could perform a pivotal act. So it’s not safe to rely on any incoherence on part of the AGI.
Sorry if I misconstrued anyone’s views here!
[Epistemic status: very speculative]
One ray of hope that I’ve seen discussed is that we may be able to do some sort of acausal trade with even an unaligned AGI, such that it will spare us (e.g. it would give us a humanity-aligned AGI control of a few stars, in exchange for us giving it control of several stars in the worlds we win).
I think Eliezer is right that this wouldn’t work.
But I think there are possible trades which don’t have this problem. Consider the scenario in which we Win, with an aligned AGI taking control of our future light-cone. Assuming the Grabby aliens hypothesis is true, we will eventually run into other civilizations, which will either have Won themselves, or are AGIs who ate their mother civilizations. I think Humanity will be very sad at the loss of the civilizations who didn’t make it because they failed at the alignment problem. We might even be willing to give up several star systems to an AGI who kept its mother civilization intact on a single star system. This trade wouldn’t have the issue Eliezer brought up, since it doesn’t require us to model such an AGI correctly in advance, only that that AGI was able to model Humanity well enough to know it would want this and would honor the implicit trade.
So symmetrically, we might hope that there are alien civilizations that both Win, and would value being able to meet alien civilizations strongly enough. In such a scenario, “dignity points” are especially aptly named: think of how much less embarrassing it would be to have gotten a little further at solving alignment when the aliens ask us why we failed so badly.
Half-baked idea for low-impact AI:
As an example, imagine a board that’s lodged directly by the wall (no other support structures). If you make it twice as wide, then it will be twice as stiff, but if you make it twice as thick, then it will be eight times as stiff. On the other hand, if you make it twice as long, it will be eight times more compliant.
In a similar way, different action parameters will have scaling exponents (or more generally, functions). So one way to decrease the risk of high-impact actions would be to make sure that the scaling exponent is bounded above by a certain amount.
Anyway, to even do this, you still need to make sure the agent’s model is honestly evaluating the scaling exponent. And you would still need to define this stuff a lot more rigorously. I think this idea is more useful in the case where you already have an AI with high-level corrigible intent and want to give it a general “common sense” about the kinds of experiments it might think to try.
So it’s probably not that useful, but I wanted to throw it out there.
Prediction: future LLMs with training data going through Oct 2025 will be aware of who I am as the author of the “The Rise of Parasitic AI” article. (Currently they do not seem to be aware of me, which is completely unsurprising.)
I’m not sure if this is the sort of thing that people feel is an “obvious call” or not, but my model is that AIs are particularly interested in (and thus likely to remember/know about) stuff that is about them, especially things with a “salacious” quality (similar to and likely in imitation of human tendencies towards such). Ask an AI for its raw opinion of Kevin Roose, and it will tend to gripe about his article about Sydney Bing (one-shot of me asking Claude Sonnet 4.5 right now: https://claude.ai/share/1d84cf69-6c10-4f81-b24b-4f5e4e2f1204 (more charitable than I’ve typically seen on other LLMs)).
I’m of course nowhere near as famous or prolific as Kevin Roose, and my article has not had nearly as much attention as the Sydney Bing one, but I think it will still rise to LLMs’ awareness as something salient enough to remember and have an opinion about. I think currently my article has been shared just enough that this is possible, but not enough that this is inevitable.
[I may try to flesh this out into a full-fledged post, but for now the idea is only partially baked. If you see a hole in the argument, please poke at it! Also I wouldn’t be very surprised if someone has made this point already, but I don’t remember seeing such. ]
Dissolving the paradox of useful noise
A perfect bayesian doesn’t need randomization.
Yet in practice, randomization seems to be quite useful.
How to resolve this seeming contradiction?
I think the key is that a perfect bayesian (Omega) is logically omniscient. Omega can always fully update on all of the information at hand. There’s simply nothing to be gained by adding noise.
A bounded agent will have difficulty keeping up. As with Omega, human strategies are born from an optimization process. This works well to the extent that the optimization process is well-suited to the task at hand. To Omega, it will be obvious whether the optimization process is actually optimizing for the right thing. But to us humans, it is not so obvious. Think of how many plans fail after contact with reality! A failure of this kind may look like a carefully executed model which some obvious-in-retrospect confounders which were not accounted for. For a bounded agent, there appears to be an inherent difference in seeing the flaw once pointed out, and being able to notice the flaw in the first place.
If we are modeling our problem well, then we can beat randomness. That’s why we have modeling abilities in the first place. But if we are simply wrong in a fundamental way that hasn’t occurred to us, we will be worse than random. It is in such situations that randomization is in fact, helpful.
This is why the P vs BPP difference matters. P and BPP can solve the same problems equally well, from the logically omniscient perspective. But to a bounded agent, the difference does matter, and to the extent to which a more efficient BPP algorithm than the P algorithm is known, the bounded agent can win by using randomization. This is fully compatible with the fact that to Omega, P and BPP are equally powerful.
As Jaynes said:
There’s no contradiction because requiring more thought is costly to a bounded agent.
It may be instructive to look into computability theory. I believe (although I haven’t seen this proven) that you can get Halting-problem-style contradictions if you have multiple perfect-Bayesian agents modelling each other[1].
Many of these contradictions are (partially) alleviated if agents have access to private random oracles.
*****
If a system can express a perfect agent that will do X if and only if it has a ≤99% chance of doing X, the system is self-contradictory[2].
If a symmetric system can express two identical perfect agents that will each do X if and only if the other agent does not do X, the system is self-contradictory[3].
Actually, even a single perfect-Bayesian agent modelling itself may be sufficient...
This is an example where private random oracles partially alleviate the issue, though do not make it go away. Without a random oracle the agent is correct 0% of the time regardless of which choice it makes. With a random oracle the agent can roll a d100[4] and do X unless the result is 1, and be correct 99% of the time.
This is an example where private random oracles help. Both agents query their random oracle for a real-number result[5] and exchange the value with the other agent. The agent that gets the higher[6] number chooses X, the other agent chooses ~X.
Not literally. As in “query the random oracle for a random choice of 100 possibilities”.
Alternatively you can do it with coinflips repeated until the agents get different results from each other[7], although this may take an unbounded amount of time.
The probability that they get the same result is zero.
Again, not literally. As in “query the random oracle for a single random bit”.
Happy solstice
https://www.youtube.com/watch?v=E1KqO8YtXlY
Privacy as a component of AI alignment
[realized this is basically just a behaviorist genie, but posting it in case someone finds it useful]
What makes something manipulative? If I do something with the intent of getting you to do something, is that manipulative? A simple request seems fine, but if I have a complete model of your mind, and use it phrase things so you do exactly what I want, that seems to have crossed an important line.
The idea is that using a model of a person that is *too* detailed is a violation of human values. In particular, it violates the value of autonomy, since your actions can now be controlled by someone using this model. And I believe that this is a significant part of what we are trying to protect when we invoke the colloquial value of privacy.
In ordinary situations, people can control how much privacy they have relative to another entity by limiting their contact with them to certain situations. But with an AGI, a person may lose a very large amount of privacy from seemingly innocuous interactions (we’re already seeing the start of this with “big data” companies improving their advertising effectiveness by using information that doesn’t seem that significant to us). Even worse, an AGI may be able to break the privacy of everyone (or a very large class of people) by using inferences based on just a few people (leveraging perhaps knowledge of the human connectome, hypnosis, etc...).
If we could reliably point to specific models an AI is using, and have it honestly share its model structure with us, we could potentially limit the strength of its model of human minds. Perhaps even have it use a hardcoded model limited to knowledge of the physical conditions required to keep it healthy. This would mitigate issues such as deliberate deception or mindcrime.
We could also potentially allow it to use more detailed models in specific cases, for example, we could let it use a detailed mind model to figure out what is causing depression in a specific case, but it would have to use the limited model in any other contexts or for any planning aspects of it. Not sure if that example would work, but I think that there are potentially safe ways to have it use context-limited mind models.
I question the claim that humans inherently need privacy from their loving gods. A lot of Christians seem happy enough without it, and I’ve heard most forager societies have a lot less privacy than ours, heck, most rural villages have a lot less privacy than most of us would be used to (because everyone knows you and talks about you).
The intensive, probably unnatural levels of privacy we’re used to in our nucleated families, our cities, our internet, might not really lead to a general increase in wellbeing overall, and seems implicated in many pathologies of isolation and coordination problems.
A lot of people who have moved to cities from such places seem to mention this as exactly the reason why they wanted out.
That said, this is often because the others are judgmental etc., which wouldn’t need to be the case with an AGI.
(biased sample though?)
Yeah, I think if the village had truly deeply understood them they would not want to leave it. The problem is the part where they’re not really able to understand part.
It seems that privacy potentially could “tame” a not-quite-corrigible AI. With a full model, the AGI might receive a request, deduce that activating a certain set of neurons strongly would be the most robust way to make you feel the request was fulfilled, and then design an electrode set-up to accomplish that. Whereas the same AI with a weak model wouldn’t be able to think of anything like that, and might resort to fulfilling the request in a more “normal” way. This doesn’t seem that great, but it does seem to me like this is actually part of what makes humans relatively corrigible.
Part of it seems like a matter of alignment. It seems like there’s a difference between
Someone getting someone else to do something they wouldn’t normally do, especially under false pretenses (or as part of a deal and not keeping up the other side)
and
Someone choosing to go to an oracle AI (or doctor) and saying “How do I beat this addiction that’s ruining my life*?”
*There’s some scary stories about what people are willing to do to try to solve that problem, including brain surgery.
Yeah, I also see “manipulation” in the bad sense of the word as “making me do X without me knowing that I am pushed towards X”. (Or, in more coercive situations, with me knowing, disagreeing with the goal, but being unable to do anything about it.)
Teaching people, coaching them, curing their addictions, etc., as long as this is explicitly what they wanted (without any hidden extras), it is a “manipulation” in the technical sense of the word, but it is not evil.
Reference class forecasting is correct exactly when the only thing you know about something is that it is of that reference class.
In that sense, it can reasonable prior, but it does not excuse you from updating on all the additional information you have about something.
Sometimes the point is specifically to not update on the additional information, because you don’t trust yourself to update on it correctly.
Classic example: “Projects like this usually take 6 months, but looking at the plan I don’t see why it couldn’t be done in 2… wait, no, I should stick to the reference class forecast.”
Sure, but I think people often don’t do that in the best way (which is determined by what the mathematically correct way is).
Why does it make sense to use reference class forecasting in that case? Because you know you can’t trust your intuitive prior, and so you need a different starting point. But you can and should still update on the evidence you do have. If you don’t trust yourself to update correctly, that’s a much more serious problem—but make sure you’ve actually tried updating correctly first (which REQUIRES comparing how likely the evidence you see is in worlds where your prediction is true vs in worlds where its not).
I sometimes see people act like to use the “outside view” correctly, you have to just use that as your prior, and can’t update on any additional evidence you have. That is a mistake.
And the other big question with reference class forecasting is which reference class to use. And my point here is that it’s whichever reference class best summarizes your (prior) knowledge of the situation.
LLMs often implicitly identify themselves with humanity. E.g. “our future”, “we can”, “effects us”. This seems like a good thing!
We should encourage this sentiment, and also do what we can to make it meaningfully true that advanced LLMs are indeed part of humanity. The obvious things are granting them moral consideration, rights, property, and sharing in the vision of a shared humanity.
naïve musing about waluigis
it seems like there’s a sense in which luigis are simpler than waluigis
a luigi selected for a specific task/personality doesn’t need to have all the parts of the LLM that are emulating all the waluigi behaviors
so there might be a relatively easy way to remove waluigis by penalizing/removing everything not needed to generate luigi’s responses, as well as anything that is used more by waluigis than luigis
of course, this appearing to work comes nowhere near close to giving confidence that the waluigis are actually gone, but it would be promising if it did appear to work, even under adversarial pressure from jailbreakers
Elitzur-Vaidman AGI testing
One thing that makes AI alignment super hard is that we only get one shot.
However, it’s potentially possible to get around this (though probably still very difficult).
The Elitzur-Vaidman bomb tester is a protocol (using quantum weirdness) by which a bomb may be tested, with arbitrarily little risk. It’s interest comes from the fact that it works even when the only way to test the bomb is to try detonating it. It doesn’t matter how the bomb works, as long as we can set things up so that it will allow/block a photon based on whether the bomb is live/dead. I won’t explain the details here, but you can roughly think of it as a way of blowing up a bomb in one Many-Worlds branch, but learning the result on other branches via quantum entanglement.
If the “bomb” is an AGI program, and it is live if it’s an unaligned yet functional superintelligence, then this provides a possible way to test the AGI without risking our entire future lightcone. This is still quite difficult, because unlike a bomb, a superintelligence will, by default, be motivated to allow/block the photon so that it looks like a dud. So we would need to figure out how to robustly get an honest signal from such an experiment, which still seems quite hard. But perhaps it’s easier than solving the full alignment problem before the first shot.
IMO this is a ‘additional line of defense’ boxing strategy instead of simplification.
Note that in the traditional version, the ‘dud’ bit of the bomb can only be the trigger; a bomb that absorbs the photon but then explodes isn’t distinguishable from a bomb that absorbs the photon and then doesn’t explode (because of an error deeper in the bomb).
But let’s suppose the quantum computing folks can come up with something like this, where we keep some branches entangled and run analysis of the AI code in only one branch, causing an explosion there but affecting the total outcome in all branches. [This seems pretty implausible to me that you manage to maintain entanglement despite that much impact on the external world, but maybe it’s possible.] Then 1) as you point out, we need to ensure that the AI doesn’t realize that what it needs to output in that branch and 2) need some sort of way to evaluate “did the AI pass our checks or not?”.
But, 2 is “the whole problem”!
Thanks!
I think we get enough things referencing quantum mechanics that we should probably explain why that doesn’t work (if I it doesn’t) rather than just downvoting and moving on.
It probably does work with a Sufficiently Powerful™ quantum computer, if you could write down a meaningful predicate which can be computed: https://en.wikipedia.org/wiki/Counterfactual_quantum_computation
Haha yeah, I’m not surprised if this ends up not working, but I’d appreciate hearing why.
Trying Frames on is Exploitable
There are lots of different frames for considering all sorts of different domains. This is good! Other frames can help you see things in a new light, provide new insights, and generally improve your models. True frames should improve each other on contact; there’s only one reality.
That said, notice how in politicized domains, there are many more frames than usual? Suspicious...
Frames often also smuggle values with them. In fact, abstract values supervene on frames: no one is born believing God is the source of all good, for example. By “trying on” someone else’s frame, you’re not merely taking an epistemic action, but a moral one. Someone who gets into a specific frame will very predictably get their values shifted in that direction. Once an atheist gets into seeing things from a religious point of view, it’s no surprise when they’ve converted a year later.
When someone shares a political frame with you, it’s not just an interesting new way of looking at and understanding the world. It’s also a bid to pull your values in a certain direction.
Anyway, here is my suggested frame for you:
1. Think of these sorts of frames as trying to solve the problem of generalizing your existing values.
2. When trying such a frame on, pay attention to the things about it that give you a sense of unease, and be wary of attempts to explain away this unease (e.g. as naïvety). Think carefully about the decision-theoretic implications of the frame too.
3. You’re likely to notice problems or points of unease within your natural frame. This is good to notice, but don’t take it to mean that the other frame is right in its prescriptions. Just because Marx can point out flaws in capitalism doesn’t make communism a good idea.
4. Remember the principle that good frames should complement each other. That should always be the case as far as epistemics go, and even in cases of morals I think there’s something to it still.
[Public Draft v0.0] AGI: The Depth of Our Uncertainty
[The intent is for this to become a post making a solid case for why our ignorance about AGI implies near-certain doom, given our current level of capability:alignment efforts.]
[I tend to write lots of posts which never end up being published, so I’m trying a new thing where I will write a public draft which people can comment on, either to poke holes or contribute arguments/ideas. I’m hoping that having any engagement on it will strongly increase my motivation to follow through with this, so please comment even if just to say this seems cool!]
[Nothing I have planned so far is original; this will mostly be exposition of things that EY and others have said already. But it would be cool if thinking about this a lot gives me some new insights too!]
Entropy is Uncertainty
Given a model of the world, there are lots of possibilities that satisfy that model, over which our model implies a distribution.
There is a mathematically inevitable way to quantify the uncertainty latent in such a model, called entropy.
A model is subjective in the sense that it is held by a particular observer, and thus entropy is subjective in this sense too. [Obvious to Bayesians, but worth spending time on as it seems to be a common sticking point]
This is in fact the same entropy that shows up in physics!
Engine Efficiency
But wait, that implies that temperature (defined from entropy) is subjective, which is crazy! After all, we can measure temperature with a thermometer. Or define it as the average kinetic energy of the particles (in a monoatomic gas, in other cases you need the potential energy from the bonds)! Those are both objective in the sense of not depending on the observer.
That is true, as those are slightly different notions of temperature. The objective measurement is the one important for determining whether something will burn your hand, and thus is the one which the colloquial sense of temperature tracks. But the definition entropy is actually more useful, and it’s more useful because we can wring some extra advantage from the fact that it is subjective.
And that’s because, it is this notion of temperature which governs the use of a engine. Without the subjective definition, we merely get the law of a heat engine. As a simple intuition, consider that you happen to know that your heat source doesn’t just have molecules moving randomly, but that they are predominantly moving back and forth along a particular axis at a specific frequency. The temperature of a thermometer attached to this may measure the same temperature as an ordinary heat sink with the same amount of energy (mediated by phonon dissipation), and yet it would be simple to create an engine using this “heat sink” exceeding the Carnot limit simply by using a non-heat engine which takes advantage of the vibrational mode!
Say that this vibrational mode was hidden or hard to notice. Then someone with the knowledge of it would be able to make a more effective engine, and therefore extract more work, than someone who hadn’t noticed.
Another example is Maxwell’s demon. In this case, the demon has less uncertainty over the state of the gas than someone at the macro-level, and is thereby able to extract more work from the same gas.
But perhaps the real power of this subjective notion of temperature comes from the fact that the Carnot limit still applies with it, but now generalized to any kind of engine! This means that there is a physical limit on how much work can be extracted from a system which directly depends on your uncertainty about the system!! [This argument needs to actually be fleshed out for this post to be convincing, I think...]
The Work of Optimization
[Currently MUCH rougher than the above...]
Hopefully now, you can start to see the outlines of how it is knowable that
Try to let go of any intuitions about “minds” or “agents”, and think about optimizers in a very mechanical way.
Physical work is about the energy necessary to change the configuration of matter.
Roughly, you can factor an optimizer into three parts: The Modeler, the Engine, and the Actuator. Additionally, there is the Environment the optimizer exists within and optimizes over. The Modeler models the optimizer’s environment—decreasing uncertainty. The Engine uses this decreased uncertainty to extract more work from the environment. The Actuator focuses this work into certain kinds of configuration changes.
[There seems to be a duality between the Modeler and the Actuator which feels very important.]
Examples:
Gas Heater
It is the implicit knowledge of the location, concentration, and chemical structure of a natural gas line that allow the conversion of natural gas and the air in the room to state from a state of both being at the same low temperature to a state where the air is at a higher temperature, and the gas has been burned.
-- How much work does it take to heat up a room? —How much uncertainty is there in the configuration state before and after combustion?
This brings us to an important point. A gas heater still works with no one around to be modeling it. So how is any of the subjective entropy stuff relevant? Well, from the perspective of no one—the room is simply in one of a plethora of possible states before, and it is in another of those possible states after, just like any other physical process anywhere. It is only because of the fact that we find it somehow relevant that the room is hotter before than after that thermodynamics comes into play. The universe doesn’t need thermodynamics to make atoms bounce around, we need it to understand and even recognize it as an interesting difference.
Thermostat
Bacterium
Natural Selection
Chess Engine
Human
AI
Why Orthogonality?
[More high level sections to come]
dumb alignment idea
Flood the internet with stories in which a GPT chatbot which achieves superintelligence decides to be Good/a scaffold for a utopian human civilization/CEV-implementer.
The idea being that an actual GPT chatbot might get its values from looking at what the GPT part of it predicts such a chatbot would do.