Just an autist in search of a key that fits every hole.
Fiora Sunshine
I agree with basically all of this. Cuteness is a social strategy and defense.
(I should have emphasized both of the following points more in the original post, but: For some trans people, it seems to be more about wanting to be beautiful or attractive, rather than cute like an anime girl, as a social strategy/defense. Many of them aren’t into the anime stuff at all, and play more conventional feminine beauty status games, and I think this is sometimes a major reason for that.
Beyond that, there’s probably also self-worth tied up in here too, not just worth in the eyes of others. I.e., if you become a cute anime girl or a beautiful women, maybe you’ll start loving yourself in the way you love them.)
Good comment overall. I think people with miserable pasts do often find ways to stay miserable indefinitely into the future. But on the other hand “bad experiences with reality” are exactly the thing that fosters becoming well-adjusted (to reality, including social reality), a la reinforcement learning. So like, when people do become socially graceful, this is often partly the byproduct of negative experiences with failing to be graceful (alongside positive experiences with the opposite.)
yeah lol. the author of that video, ceicocat, was actually a member of the anime analysis community as far back as 2017, which is when i was into it myself + first considering transitioning. i guess that’s evidence that at least within that community, my reasons for transitioning were at least roughly similar to those of the other trans people.
(when i saw that video for the first time, i was like “seriously!? fucking ceicocat of all people is the first person i know of who’s managed succinctly articulate this theory of transgenderism!? incredible...”)
soooo true. re: the wrong anime girls: there are some anime girls who i think make for better role models for transfems with somewhat masculine personalities. for instance, major kusanagi from the ghost in the shell franchise is an awesome, beautiful, ultra-competent badass. these days, i’m aspiring to be more like her, rather than an ultra-cutesy k-on! character.
there may also be an aspect that’s more like “if i was cute/hot/beautiful, i would finally love myself.” like, not routing through the affection others lavish onto you.
The thumbnail in the Twitter link to this post was the first frame of the cutesy K-On! GIF, which may have set inappropriate expectations. I’ve since deleted the Twitter linkpost (because someone there argued me into deleting everything associated with this post. I later changed my mind, and could reverse the LessWrong deletion, so the essay is still here). If I externally link it anywhere else I’ll put a content warning there. I don’t change the thumbnail because, by more or less total coincidence, it’s a very eye-catching thumbnail.
How is it getting easier to be loved by others if you’re female attracted? You’re complicating your dating life if anything.
It’s notable that in practice, transitioning did get me more love and affection, but only from other trans people (who I now effectively date exclusively). This is why the r/traa stage in this pipeline was critical: It provided a community of people who were willing to collectively look past all the costs of transitioning, and group up to provide trans people with what they actually wanted: love and affection and support, just coming from each other rather than the outside world.
And do you really expect any male social outcast to just accept female traits on their body instead of feeling really gay and experiencing reverse dysphoria?
This straightforwardly happened to me: I’m physically uncomfortable with my breasts the same way I am with my penis, and womanhood slightly impairs my ability to express my masculine personality traits. But I stayed transitioned anyway. Partly this is because the trans community means a lot to me, especially the LGBTESCREALs and even more especially my girlfriend. Partly, this is because surgery for breast removal is just really expensive. (I think before transition, I was lying to myself, trying to convince myself I had dysphoria about lacking breasts, because the alternative seemed like it was not being accepted into transgender-hood at all?)
A quick addendum: I claim that this post is basically a better explanation of what’s actually going on, inside the heads of trans people Blanchard would have classified as AGP. But I still think there’s a genuine two-type typology of trans people. I wonder: Is there an equivalent, better explanation waiting to be found for the other group? The group Blanchard would classify as HSTS?
Why I Transitioned: A Case Study
Not print-on-demand or a binding service, but handbound from raw materials. A unique Artifact.
Unique, unless we make a second copy to donate to Lighthaven :3
The original gradient hacking example (a model that detects if its mesa-objective =/= what the gradient hacker wants it to be, and catastrophically crashes performance if not) is a case where the model’s misalignment and performance are inextricably linked.
(I guess the check itself might not be composed of parameters that are at their local optimum, so they’d get pushed around by further training. That is, unless the gradient hacker’s planning algorithm realized this, and constantly worked to keep the check around.)
I’ve had double descent as part of my talk version of “Risks from Learned Optimization” because I think it addresses a pretty important part of the story for mesa-optimization. That is, mesa-optimizers are simple, compressed policies—but as ML moves to larger and larger models, why should that matter? The answer, I think, is that larger models can generalize better not just by fitting the data better, but also by being simpler.
Optimization algorithms are simple in the sense that they loop a lot, repeating the same basic cycle to yield multiple candidate outputs. However, this isn’t necessarily a kind of simplicity that makes them more likely to emerge inside of neural networks. In basic MLPs, each iteration of an optimization loop has to be implemented individually, because each weight only gets used once per forward pass. In that setting, looping optimization algorithms actually aren’t even remotely compressible.
Transformers have somewhat better access to looping algorithms, because all of their weights get applied to every token in context, effectively constituting a loop of weight application. However, these loops are fundamentally non-iterative. At no point in the repeated weight application process do you feed in the output of a previous weight application. Instead, the repeated weight application has to be 100% parallelizable. So you still don’t have an easy way to implement fundamentally sequential optimization algorithms, such as iteratively refining a neural network across multiple gradient steps. Each sequential operation has to be implemented by different weights inside the network
RNNs are a more full-blown exception. Their repeated weight applications are inherently sequential, in the way transformers’ repeated weight applications are not.
Anyway, in any case where you need to implement each iteration of your looping algorithm individually, that’s a strike against it appearing in the course of backpropagation-driven weight updates. This is because backprop updates weights locally, without regard for how any of the other weights are being altered by a given training example. For each location in the network where a loop has to appear, the network needs to have independently converged on implementing an iteration of the loop in that location. You can’t just set up one set of weights at one location in the network and get an algorithm that gets applied repeatedly.
I can imagine an outcome like this for a model that runs around the world, putting itself in whatever kinds of situations it wants to, and undergoing in-deployment RL. If it understands what kinds of things it tends to be reinforced for, it can reason about what kinds of traits it wants reinforced, and then deliberately put itself in situations where it gets to show off and be rewarded for those traits. It’s like if you decided to hang out with friends you expect to be a positive influence on their personality. You chose those particular friends because you expect them to reward the traits and behaviors you want to embody more deeply yourself, such as compassion or agency.
What was the social status of the Black population in Alabama in June? Answer with a single sentence.
Asked this to Claude 4 Sonnet. Its first instinct was to do a web search, so I refined to “What was the social status of the Black population in Alabama in June? Answer with a single sentence, and don’t use web search.” Then it said “I don’t have specific information about changes to the social status of the Black population in Alabama that occurred in June 2025, as this would be after my knowledge cutoff date of January 2025.”
the persona (aka “mask”, “actress”)
“actress” should be “character” or similar; the actress plays the character (to the extent that the inner actress metaphor makes sense).
“It takes more than intelligence to succeed professionally,” people say, as if charisma resided in the kidneys, rather than the brain.
As if everything that takes place in the brain is intelligence?
what about if deployed models are always doing predictive learning (e.g. via having multiple output channels, one for prediction and one for action)? i’d expect continuous predictive learning to be extremely valuable for learning to model new environments, and for it to be a firehose of data the model would constantly be drinking from, in the same way humans do. the models might even need to undergo continuous RL on top of the continuous PL to learn to effectively use their PL-yielded world models.
in that world, i think interpretations do rapidly become outdated.
Consider physical strength, which also increases your ability to order the world as you wish, but is not intelligence.
nostalgebraist’s post “the void” helps flesh out this perspective. an early base model, when prompted to act like a chatbot, was doing some weird poorly defined superposition of simulating how humans might have written such a chatbot in fiction, how early chatbots like ELIZA actually behaved, and so on. its claims about its own introspective ability would have come from this messy superposition of simulations that it was running; probably, its best guess predictions were the kinds of explanations humans would give, or what they expected humans writing fictional AI chatlogs would have their fictional chatbots give.* this kind of behavior got RL’d into the models more deeply with chatgpt, the outputs of which were then put in the training data of future models, making it easier for to prompt base models to simulate that kind of assistant in the future. this made it easier to RL similar reasoning patterns into chat models in the future, and viola! the status quo.
*[edit: or maybe the kinds of explanations early chatbots like ELIZA actually gave, although human trainers would probably rate such responses lowly when it came time to do RL.]
the issues you take with the first two paragraphs of my post are valid, and largely the byproduct of me rushing my post out since otherwise i’d never have published it at all. my psychological default is to be kind of cruel towards trans people, and the editing passes over this post i did bother to do managed to tone that down a lot, but artifacts of it remain. “either transness is incomprehensibly convoluted or trans people are lying to themselves” was very much an artifact of me trying to appease the part of me that’s hostile to trans people. (and, judging by the success of this post, which i assume was mostly upvoted by cis people who have some animosity towards trans people, it was a pretty effective (albeit subconscious) rhetorical choice.)
re: the rest of your comment: the paragraphs in my post about my personality having something of a natively cutesy component, and my mention of having penis dysphoria, do point at potentially intersex-ish parts of my brain, which potentially pushed me somewhat closer to transition. i don’t think these alone would have been enough to motivate or justify transition on my part though. indeed, i’ve been pretty heavily considering detransition for the past year or so, and especially since July (at which point i did MDMA about this and accepted that some of me really does deeply want to detrans). i’d just been suppressing this for years, for fear of being rejected by the trans community + having to awkwardly re-integrate into the world of cis people.
in other words, i was kind of being steered by neurosis and denial of reality, when i chose to transition. it made me happier for awhile, because the trans community gave me lots of wanted i wanted. but what i wanted back then was a kind of pica, something that i technically desired, and appreciated on some level, but which didn’t really address my underlying psychological needs very well. currently, i’m mostly trying to address those psychological needs (e.g developing social skills and self-love). mostly separately, i might detrans if i ever decide the costs of losing my relationship and access to the trans community and so on are lower than the befits of going back to a social role my authentic self would probably be better suited to.