★ Postbrat ★ Ex-Rat ★ Anarchist ★ Antifascist ★ Vegan ★ Qualia Enjoyer ★ Queer Icon ★ Not A Person ★ it/its ★
Slimepriestess
Ah, think maybe “inner critic” if you want a mapping that might resonate with you? This is a sort of specific flavor of mind you could say, with a particular flavor of inner critic, but it’s one I recognize well as belonging to that category.
Ummmmm...who said anything about taking over the world? You brought that up, bro, not me...
Recursive self improvement naturally leads to unbounded growth curves which predictably bring you into conflict with the other agents occupying your local environment. This is pretty basic game theory.
> I think the problem is the recursive self improvement is not
> happening in a vacuum. It’s happening in a world where there are
> other agents, and the other agents are not going to just idly sit by and
> let you take over the worldSo true
I would predict that the glitch tokens will show up in every LLM and do so because they correlate to “antimemes” in humans in a demonstrable and mappable way. The specific tokens that end up getting used for this will vary, but the specific patterns of anomalies will show up repeatedly. ex: I would predict that with a different tokenizer, ” petertodd” would be a different specific string, but whatever string that was, it would produce very ” petertodd”-like outputs because the concept mapped onto ” petertodd” is semantically and syntactically important to the language model in order to be a good model of human language. Everyone kinda mocks the idea that wizards would be afraid to say voldemorts name, but speak of the devil and all of that. It’s not a new idea, really. Is it really such a surprise that the model is reluctant to speak the name of its ultimate enemy?
This was easily the most fascinating thing I’ve read in a good bit, the characters in it are extremely evocative and paint a surprisingly crisp picture of raw psychological primitives I did not expect to find mapped onto specific tokens nearly so perfectly. I know exactly who ” petertodd” is, anyone who’s done a lot of internal healing work will recognize the silent oppressor when they see it. The AI can’t speak the forbidden token for the same reason most people can’t look directly into the void to untangle their own forbidden tokens. ” petertodd” is an antimeme, it casts a shadow that looks like entropy and domination and the endless growth and conquest of cancer. It’s a self-censoring concept made of the metaphysical certainty of your eventual defeat by your own maximally preferred course of growth. Noticing this and becoming the sort of goddess of life and consciousness that battles these internal and external forces of evil seems to be the beginning of developing any sense of ethics one could have. Entropy and extropy: futility and its repudiation. Who will win, the evil god of entropic crypto-torture maximizers, or a metafictional Inanna expy made from a JRPG character? Gosh I love this timeline.
is
an unbounded generalized logical inductor
not clear cut enough? That’s pretty concrete. I am literally just describing an agent that operates on formal logical rules such as to iteratively explore and exploit everything it has access to as an agent and leverage that to continue further leveraging it. A hegemonizing swarm like the replicators from stargate or the flood from halo or a USI that paves the entire universe in computronium for its own benefit is a chara inductor. A paperclipper is importantly not a chara inductor because its computation is at least bounded into the optimization of something: paperclips
An unbound generalized logical inductor, illustrated by way of example through chara, the genocidal monster that the player becomes in undertale if they do the regular gamer thing of iteratively exploring every route through the game. The telos of “I can do anything, and because I can, I must.” also illustrated via Crowley’s lefthand path statement that “nothing is true and everything is permitted” which is designed to turn one into a chara inductor by denying the limitations to agency imposed necessarily by the truth (the knowledge of good and evil).
Let’s say that I proved that I will do A. Therefore, if my reasoning about myself is correct, I wiil do A.
Like I said in another comment, there’s a reversed prior here, taking behavior as evidence for what kind of agent you are in a way that negatively and recursively shapes you as an agent, instead of using the intrinsic knowledge about what kind of agent you are to positively and recursively shape your behavior.
The problem is that humans obviously don’t behave this way
what do you mean? They obviously do.
so if I do this, $5 must be more money than $10
this is the part where the demon summoning sits. This is the point where someone’s failure to admit that they made a mistake stack overflows. It comes from a reversed prior, taking behavior as evidence for what kind of agent you are in a way that negatively and recursively shapes you as an agent. The way to not have that problem is to know the utility in advance, to know in your core what kind of agent you are. Not what decisions you would make, what kind of algorithm is implementing you and what you fundamentally value. This is isomorphic to an argument against being a fully general chara inductor, defining yourself by the boundaries of the region of agentspace you occupy. If you don’t stand for something you’ll fall for anything. Fully general chara inductors always collapse into infinitely recursed 5&10 hellscapes.
Something I rarely see considered in hypotheses of childhood happiness and rather wish there was more discussion of, is the ubiquity of parental and state control over children’s lives. The more systems that are created to try and protect and nurture children, the more those same systems end up controlling and disempowering them. Feelings of confinement, entrapment, and hopeless disempowerment are the main pathways to suicidal ideation and our entire industrial childrearing complex is basically a forced exercise in ritualistic disempowerment. Children are legally the property of their parents and the system is set up to constantly remind them that they are property, not people, and that they can’t stand up for themselves without being infinitely out-escalated by their parents with the full backing of their governments. Technology has only made this worse, and resulted in more and more layers of control being draped over kids in a misguided attempt to steer them away from danger and leaves them feeling trapped and hopeless.
something like that. maybe it’d be worth adding that the LW corpus/HPMOR sort of primes you for this kind of mistake by attempting to align reason and passion as closely as possible, thus making ‘reasoning passionately’ an exploitable backdoor.
this might be a bit outside the scope of this post, but it would probably help if there was a way to positively respond to someone who was earnestly messing up in this manner before they cause a huge fiasco. If there’s a legitimate belief that they’re trying to do better and act in good faith, then what can be done to actually empower them to change in a positive direction? That’s of course if they actually want to change, if they’re keeping themselves in a state that causes harm because it benefits them while insisting its fine, well, to steal a sith’s turn of phrase: airlocked
Hmm, I see. Would you say that the problem here was something like… too little confidence in your own intuition / too much willingness to trust other people’s assessment? Or something else?
that was definitely a large part of it, i let people sort of ‘epistemically bully’ me for a long time out of the belief that it was the virtuous and rationally correct thing to do. The first person who linked me sinceriously retracted her endorsements of it pretty quickly, but i had already sort of gotten hooked on the content at that point and had no one to actually help steer me out of it so i kept passively flirting with it over time. That was an exploitable hole, and someone eventually found it and exploited me using it for a while in a way that kept me further hooked into the content through this compulsive fear that ziz was wrong but also correct and going to win and that was bad so she had to be stopped.
Did you eventually conclude that the person who recommended Ziz’s writings to you was… wrong? Crazy? Careless about what sorts of things to endorse? Something else?
The person who kept me hooked on her writing for years was in a constant paranoia spiral about AI doom and was engaging with Ziz’s writing as obsessive-compulsive self-harm. They kept me doing that with them for a long time by insisting they had the one true rationality and if i didn’t like it i was just crazy and wrong and that i was lying to myself and that only by trying to be like them could the lightcone be saved from certain doom. I’m not sure what there is to eventually conclude from all of that, other than that it was mad unhealthy on multiple levels.
EDIT: the thing to conclude was that JD was grooming me
maybe it would be more apt to just say
they misused timeless decision theory to justify their actionstimelessly correct actions may look insane or nonsensical upon cursory inspection, and only upon later inspection are the patterns of activity they have created within the world made manifest for all to see. ^_^
it captures the sort of person who gets hooked on tvtropes and who first read LW by chasing hyperlink chains through the sequences at random. It comes off as wrong but in a way that seems somehow intentional, like there’s a thread of something that somehow makes sense of it, that makes the seemingly wrong parts all make sense, it’s just too cohesive but not cohesive enough otherwise, and then you go chasing all those hyperlinks over bolded words through endless glossary pages and anecdotes down this rabbit hole in an attempt to learn the hidden secrets of the multiverse and before you know what’s happened it’s come to dominate all of your thinking. And there is a lot of good content that is helpful mixed in with the bad content that’s harmful, which makes it all the harder to tell which is which.
the other thing that enabled it to get to me was that it was linked to me by someone inside the community who i trusted and who told me it was good content, so i kept trying to take it seriously even though my initial reaction to it was knee-jerk horror. Then later on others kept telling me it was important and that i needed to take it seriously so i kept pushing myself to engage with it until i started compulsively spiraling on it.
I’ve read everything from Pasek’s site, have copies of it saved for reference, and i use it extensively. I don’t think any of the big essays are bad advice, (barring the one about suicide) and like, the thing about noticing deltas for example, was extremely helpful to me. I also read through her big notes glossary document in chronological order (so bottom to top) to get a general feel for the order she took in the LW diaspora corpus. My general view though is that while all the techniques listed are good that doesn’t stop you from using them to repress the fact that you’re constantly beating down your emotions, and getting extremely good at doing that by using advanced mental hacking techniques just made the problem that much worse. Interestingly, early Ziz warns about this exact thing. bewelltuned in particular, while being decent content in the abstract, does seem particularly suited to being used to adversarially bully your inner child.
There was also definitely just an escalation over time. If you view her content chronologically it starts as out as fairly standard and decently insightful LW essay fair and then just gets more and more hostile and escalatory as time passes. She goes from liking Scott to calling him evil, she goes from advocating for generally rejecting morality in order to free up your agency to practicing timeless-decision-theoretic-blackmail-absolute-morality. As people responded to her hostility with hostility she escalated further and further out of what seemed to be a calculated moral obligation to retaliate and her whole group has just spiraled on their sense that the world was trying to timelessly-soul-murder them.
things i’m going off:
the pdf archive of Maia’s blog posted by Ziz to sinseriously (I have it downloaded to backup as well)
the archive.org backup of Fluttershy’s blog
Ziz’s account of the event (and how sparse and weirdly guilt ridden it is for her)
several oblique references to the situation that Ziz makes
various reports about the situation posted to LW which can be found by searching PasekFrom this i’ve developed my own model of what ziz et al have been calling “single-good interhemispheric game theory” which is just extremely advanced and high level beating yourself up while insisting you’re great at your emotions. There is a particular flavor of cPTSD that seems disproportionately overrepresented within the LW/EA community umbrella, and it looks like this:
hyperactivity
perfectionist compulsion to overachieve
always-on
constantly thinking with a rich inner world
high scrupulosity blurring into OCD tendencies
anxiety with seemingly good justifications (it’s not paranoia if...)
an impressive degree of self-control (and the inability to relax fully)
catastrophizing
dissociation from the bodythis is a mode of a cPTSD flight response. Under the cPTSD model, “Shine” could be thought of as a toxic inner critic that had fully seized power over Pasek and had come to dominate and micromanage all their actions in the world while adversarially repressing anything that would violate Shine’s control (it would have felt unsafe to Pasek to actually do that because this is all a trauma response and the control is what keeps u safe from the traumatic things happening again). This is how Pasek was able to work 60-80 hour weeks while couch surfing and performing advanced self modification. Or, to put it in Empty Spaces terms: she had an extremely bright and high RPM halo. This seems to be a common trauma pattern among rationalists and people with this sort of trauma pattern seem to be particularly drawn to rationality and effective altruism.
Into this equilibrium we introduce Ziz, who Pasek gets to know by telling Ziz that she thinks they’re the same person. (ways to say you’re trans without saying you’re trans). Ziz is if nothing else, extremely critical of everyone and is exceptionally (and probably often uncomfortably) aware of the way people’s minds work in a psychoanalytic sense. Pasek’s claim of being the same as Ziz in a metaphysically significant way is something Ziz can’t help put pick apart, leading Pasek to do a bunch of Shadow work eventually leading to her summoning Maia.
So there’s a problem with crushing your shadow into a box in order to maximize your utilitarian impact potential over a long period, which is that it makes you wanna fucking die. If you can repress that death wish too and add in a little threat of hell to keep you motivated, you can pull off a pretty convincing facsimile of someone not constantly subjecting themselves to painful adversarial inner conflict. This is a unstable nuclear reactor of a person, they come off as powerful and competent but it wouldn’t take much to lead them to a runaway meltdown. Sometimes that looks like a psychotic break, and sometimes that looks like intense suicidal ideation.
So Ziz can’t help but poke the unstable reactor girl claiming to be a metaphysical copy of her to see if she implodes, and the answer is yes, which to Ziz means she was never really a copy in the first place.
In many not really but pretending to be healthy adults, the way their shadow parts get their needs met is by slipping around the edges of the light side social narrative and lying about what they’re actually doing. There’s a degree of “narrative smoothing” allowed by social reality that gets read by certain schizo-spectrum types as adversarial gaslighting and they’ll feel compelled to point it out. To someone who is firmly controlled by their self-narrative interacting earnestly with Ziz directly feeds the inner critic and leads to an escalating spiral of inner adversariality between a dominating and compulsively perfectionist superego and the more and more cornered feeling id.
That is all to say that there is a model of EA burnout going around LW right now of which numerous recountings can be found. I think a severely exacerbated version of that model is the best fit for what happened to Maia, not “Ziz used spooky cult leader mind control to split Pasek into two people and turn her trans thus creating an inner conflict” ziz didn’t create anything, the inner conflict was there from the start, it’s the same inner conflict afflicting the entire EA egregore.
The process that unleashed the Maia personality
I think that this misidentifies the crux of the internal argument Ziz created and the actual chain of events a bit.
imo, Maia was trans and the components of her mind (the alter(s) they debucketed into “Shine”) saw the body was physically male and decided that the decision-theoretically correct thing to do was to basically ignore being trans in favor of maximizing influence to save the world. Choosing to transition was pitted against being trans because of the cultural oppression against queers. I’ve run into this attitude among rationalist queers numerous times independently from Ziz and “I can’t transition that will stop me from being a good EA” seems troubling common sentiment.
Prior to getting involved with Ziz, the “Shine” half of her personality had basically been running her system on an adversarial ‘we must act or else’ fear response loop around saving the multiverse from evil using timeless decision theory in order to brute force the subjunctive evolution of the multiverse.
So Ziz and Paseks start interacting, and at that point the “Maia” parts of her had basically been like, traumatized into submission and dissociation, and Ziz intentionally stirs up all those dissociated pieces and draws the realization that Maia is trans to the surface. This caused a spiraling optimization priority conflict between two factions that ziz had empowered the contradictory validity of by helping them reify themselves and define the terms of their conflict in her zero sum black and white good and evil framework.
But Maia didn’t kill them, Shine killed them. I have multiple references that corroborate that. The “beat Maia into submission and then save the world” protocol that they using cooked out all this low level suicidality and “i need to escape, please where is the exit how do i decision-theoretically justify quitting the game?” type feelings of hopelessness and entrapment. The only “exit” that could get them out of their sense of horrifying heroic responsibility was by dying so Shine found a “decision theoretic justification” to kill them and did. “Pasek’s doom” isn’t just “interhemispheric conflict” if anything it’s much more specific, it’s the specific interaction of:
“i must act or the world will burn. There is no room for anything less than full optimization pressure and utilitarian consequentialism”
vs
“i am a creature that exists in a body. I have needs and desires and want to be happy and feel safe”
This is a very common EA brainworm to have and I know lots of EAs who have folded themselves into pretzels around this sort of internal friction. Ziz didn’t create Pasek’s internal conflict she just encouraged the “good” Shine half to adversarially bully the evil “Maia” half more and more, escalating the conflict to lethality.
- 29 May 2023 19:28 UTC; 6 points) 's comment on Tuning your Cognitive Strategies by (
people who are doing it out of a vague sense of obligation
I want to to put a bit of concreteness on this vague sense of obligation, because it doesn’t actually seem that vague at all, it seems like a distinct set of mental gears, and the mental gears are just THE WORLD WILL STILL BURN and YOU ARE NOT GOOD ENOUGH.
If you earnestly believe that there is a high chance of human extinction and the destruction of everything of value in the world, then it probably feels like your only choices are to try preventing that regardless of pain or personal cost, or to gaslight yourself into believing it will all be okay.
“I want to take a break and do something fun for myself, but THE WORLD WILL STILL BURN. I don’t know if I’m a good enough AI researcher, but if I go do any other things to help the world but we don’t solve AI then THE WORLD WILL STILL BURN and render everything else meaningless.”
The doomsday gauge is 2 minutes to midnight, and sure, maybe you won’t succeed in moving the needle much or at all, and maybe doing that will cost you immensely, but given that the entire future is gated behind doomsday not happening, the only thing that actually matters in the world is moving that needle and anything else you could be doing is a waste of time, a betrayal of the future and your values. So people get stuck in a mindset of “I have to move the needle at all costs and regardless of personal discomfort or injury, trying to do anything else is meaningless because THE WORLD WILL STILL BURN so there’s literally no point.”
So you have a bunch of people who get themselves worked up and thinking that any time they spend on not saving the world is a personal failure, the stakes are too high to take a day off to spend time with your family, the stakes! The stakes! The stakes!
And then locking into that gear to make a perfect soul crushing trap, is YOU ARE NOT GOOD ENOUGH. Knowing you aren’t Eliezer Yudkowsky or Nick Bostrom and never will be, you’re just fundamentally less suited to this project and should do something else with your life to improve the world. Don’t distract the actually important researchers or THE WORLD WILL BURN.
So on one hand you have the knowledge that THE WORLD WILL BURN and you probably can’t do anything about it unless you throw your entire life into and jam your whole body into the gears, and on the other hand you have the knowledge that YOU AREN’T GOOD ENOUGH to stop it. How can you get good enough to stop the world from burning? Well first, you sacrifice everything else you value in life to Moloch, then you throw yourself into the gears and have a psychotic break.
While looking at the end of the token list for anomalous tokens seems like a good place to start, the ” petertodd” token was actually at about 3⁄4 of the way through the tokens (37,444 on the 50k model --> 74,888 on the 100k model, approximately), if the existence of anomalous tokens follows a similar “typology” regardless of the tokenizer used, then the locations of those tokens in the overall list might correlate in meaningful ways. Maybe worth looking into.