I am not one of the tagged people but I certainly would not so agree. One reason I would not so agree is because I have talked to leftist people (prominence debatable) who celebrated the 10⁄7 attacks, and when I asked them whether they support Hamas, they were coherently able to answer “no, but I support armed resistance against Israel and don’t generally condemn actions that fall in that category, even when I don’t approve of or condone the group organizing those actions generally.” One way to know what people believe and support is to ask them. (Of course, I don’t think this is a morally acceptable position either, and conversation ensued! But it’s clearly not “supporting Hamas” in any sense that can support your original claims.)
My social circles also include many leftists, including student organizers and somewhat well-known online figures, so I separately suspect that you’re vastly overestimating the proportion of self-identified leftists who celebrated the attacks in any meaningful sense, but that’s probably not the crux here.
speck1447
I have a map of the world. I live on it.
I assume this is in the style of Steven Wright? It is in fact just a Steven Wright joke.
I think some of these are funny but most are quite bad. The rest of this comment is just my appraisal of the jokes I thought were interesting. These two:
A man spends fifteen years writing an 800,000 word rationalist novel. It’s about how to make good decisions. He posts it for free. Seven people finish it. Three of them become his enemies.
Every great thinker has one weird fan who understands them better than anyone and also cannot be allowed near the main account.
are the funny ones that as far as I can tell are original, they have some structural/pacing issues but they work. This one:
There’s a type of online guy whose whole thing is being slightly ahead of the curve. Not far enough to be a visionary. Just enough to be annoyed at everyone else for six months until they catch up. Then he moves on to being annoyed about the next thing. He’s never happy. He’s always right.
has a good setup but does not deliver. This one:
The thing about having a nemesis is that you have to keep it proportional. Too much energy and you look obsessed. Too little and it’s not a nemesis, it’s just a guy you don’t like. The sweet spot is thinking about them exactly as often as they think about you, which means you’re both trapped forever.
is structurally the best (actually has a button!) but damn is it just too wry to work. The rest are quite bad. I continue to think that frontier models are basically unfunny, but Claude is the least unfunny. (This was true when I checked GPT-5 vs. Sonnet 4.5 vs Gemini 2.5 Pro vs Grok 4 a month or so back, I am not convinced Opus 4.5 is funnier than Sonnet but it understands comedic rhythm a bit better.)
Sure, this seems more plausible. I’m sure I’d still object to your understanding of some moral and practical dimensions of monogamy, but I’m also sure you’re aware of that so talking about it is unlikely to be productive for either of us. I’d ask that you reconsider the use of the word “category” if you have this discussion with others in the future, this is just not what it means.
I agree, this is my point! If being poly means “my partner going on a date with someone else and my partner playing board games with someone else aren’t separated by a category distinction”, then I would expect there to be poly spectrum people (that is, people who understand these categories the same way you do and identify themselves as poly) who treat these things as if they’re in the same category; that is, who treat them both as a valid place to have a relationship boundary if there’s mutual agreement that this is the best way forward. But I’m not aware of any poly people who do this. A person who is fine with their partner dating others but maybe not going home with them is clearly some amount of poly, and a person who isn’t fine with their partner dating others but is fine with their partner having a board game night is clearly not poly, poly people would I think ~all agree with this, and this is obviously a category distinction! So it seems like while poly people might not care about the category distinction much, and might treat the categories more similarly than I would, they all recognize it and use it and in fact it’s impossible meaningfully be poly without recognizing and using it, so I’m a bit confused as to why you claim not to recognize it.
EDIT: Arguably this a minor point. I make it anyway because I think poly people are generally somewhat to largely mistaken about what polyamory is, and this causes (a) many poly people to try to argue that monogamous relationships are fundamentally flawed and (b) many people to try to be poly when it doesn’t actually work for them. The posts that Elizabeth is responding to exhibit (a) and your original comment reiterates them (you accept as valid reasons to not be polyamorous: physical/social/emotional deficiency, and this is all). And when the justification for being poly ends up being (b) (in this case, a claim I see as being obviously wrong about whether a certain category distinction exists), this makes me worry that some people are poly as a matter of matter of ideology rather than as a matter of preference, and so may try to convince themselves or others to be poly against preference, and in fact this is exactly what we see.
My interpretation of polyamory is basically that “my partner went to play boardgames with friends” and “my partner is on a date with someone” are in the same category.
I think if I had this perspective I would be poly, but also I am not convinced that this is a meaningful way to understand ~any poly people? For the following reason: all of the primary poly relationships I’m aware of are pretty explicit about what they do and don’t allow—certain dates are okay, certain types of sex are okay, other things require prior notification, some things require discussion, etc. It seems like every configuration of “some types dating and sex are okay but other types of dating and sex aren’t, or not by default” exists (which to be clear is cool and reasonable). But I’m not aware of any poly relationships where the rules are “we can’t date other people or have sex with other people at all but we can play board games with other people”, which makes me think that in practice, poly people recognize and use a distinction between these things.
Perhaps I’m misunderstanding what you mean by “category” here? Or perhaps the polyamory I’ve encountered just doesn’t resemble yours?
I think I have a better understanding of your position now! I’m still a bit confused by your use of the word “bad”, it seems like you’re using it to mean something other than “could meaningfully be made better”. Semantically, I don’t really know what you’re referring to when you say “the exposure itself”—the point here is that there is no such thing as the exposure itself! It is not always meaningful to split things up. There is a thing that I would call true openness and you might call something like necessary vulnerability (which you don’t necessarily need to believe exists), and that thing entails the potential for deeper social connection and the potential for emotional harm, but this just does not mean we can separate it into a connection part and a harm part. I think I’m back to my original objection basically: we should not always do goal factoring because our goals do not always factor. The point of factoring something is to break it into parts which can basically be optimized separately, with some consideration of cross-interactions, but when the cross-interactions dominate the factoring obscures the goal.
I’m also not convinced that people get confused this way? Maybe there is a way to define “bad” that makes this confusion even coherent, but I can’t think of such a way. The only way I can imagine a person endorsing the claim that the exposure itself is good is as a strong rejection of the premise that the thing that is actually good is separable from the exposure. Because, after all, if exposure under certain conditions (something like: exposure to a person I have good reason to trust, having thought about and addressed the ways it could be solvably bad, in pursuit of something I value more than I am afraid of the risk of potential pain) always corresponds with a good that is worth taking on that exposure, then every conceptually-possible version of that exposure is worth taking on net. What does it even mean to say that that category of exposure is bad if its every conceivable incarnation is net good? Maybe you can say that there’s no category difference between the sort of exposure that can be productively eliminated and the sort of exposure that can’t, but the fact that I can describe the difference between these categories seems to suggest otherwise. The only way I can see for this to fail is for the description I gave to be incoherent, which only seems possible if one of the categories is empty.
On the other hand I think many people are miscalibrated on this sort of calculation, such that they either take more or less emotional risk than they ideally would, and I explained earlier why I’m very worried about ways of thinking that tend toward underexposure and not so worried about ways that tend toward overexposure. I expect any sort of truly separate accounting to involve optimization on the risk side without consideration of the trust side, and because the effects on the trust side are subtle and harder to remember (in the sense that the sort of trust I care about is really really good in my experience, it’s the sort of thing that takes basically all of my cognition to fully experience, so when any part of my cognition is focused elsewhere I cannot accurately remember how good it is), this will tend to lose out to an unseparated approach.
(This part I have no credence to say, so feel free to dismiss it with as much prejudice as is justified, but this:Ok, I’m going to do this thing, and it has some exposure to harm, and that part is bad, but the exposure has some subtle positive effects too, and also it is truly eternally inseparable from the goods, and it’s worth it overall, so I’m going to do it.
really does not seem like the sort of thought process that could properly calibrate a person’s exposure to emotional risk! My extremely strong suspicion is that a person whose thought process goes like this with any frequency, even if they end up accepting the risk often enough when they think it through, is extremely underexposed to emotional risk and does not know it because unlike overexposure, underexposure is self-reinforcing.)
EDIT: I think we’ve nailed down our disagreement about the object-level thing and we’re unlikely to come to agree on that, it seems like the remaining discussion is just about which distinctions are useful. Maybe this is the same disagreement and we’re unlikely to come to agree about this either? My preference is to talk about vulnerability by default, with the understanding vulnerability is a contingent part of certain social goods, but in some cases the vulnerability can be trimmed without infringing on the social goods, so I would talk about unnecessary or excessive vulnerability in those cases. My understanding of your preference is to talk about vulnerability by default, with the understanding that vulnerability is the (strictly bad) exposure to emotional pain that often accompanies some social interactions. But it’s at least plausible that vulnerability could be a contingent part of certain social goods, so in discussing those sorts of social goods at least as hypothetical objects, you’d refer to something like necessary vulnerability? And in cases where vulnerability could in theory be trimmed away by a sufficiently-refined self-model, but where that level of refinement is not easy to achieve and in practice the right thing to do is to proceed under the theoretically-resolvable uncertainty, something like worthwhile vulnerability? And then our disagreements in your language would be: I think that necessary vulnerability actually exists in theory, and that the set of necessary or worthwhile vulnerability is big enough that we shouldn’t separate it from primitive vulnerability, and you would take the opposite on both of those claims. Am I understanding correctly?
I agree that this is the crux but I don’t see how this is different from what we’ve been talking about? In particular, I’m trying to argue that these notions have a big intersection, and maybe even that the second kind is a subset of the first kind (there are types of openness and trust for which we can eliminate all the excess exposure to harm, but I think they’re qualitatively different from the best kinds of openness and trust; if you think the difference is not qualitative, or that it’s obviated when we consider exposure to harm correctly, then it wouldn’t be a subset.) As a concrete example, I’m trying to argue that the sort of interaction that involves honestly exposing a core belief to another person and asking for an outside perspective, with the goal of correcting that belief if it’s mistaken, is not just practically but necessarily in the intersection (it clearly requires openness and I’m trying to argue that it also requires exposure to harm for minds worth being.) Following that, I’m trying to argue that separating these concepts is a bad idea because, while this makes it easier to talk about the sorts of excess exposure we can and should eliminate, it makes it harder to recognize the exposure that we can’t or shouldn’t eliminate, and we lose more than we gain in this trade.
I agree that you haven’t made that claim but I’m struggling to find an interpretation of what you’ve written that doesn’t imply it. In particular, in my model of your position, this is exactly the claim “vulnerability itself is bad (although it may accompany good things)” applied to the sort of vulnerability that is the risk of changing one’s identity-bearing beliefs. Maybe the following will help me pin down your position better:
That’s the opposite of what I’m saying. I’m saying try to figure out why it’s painful—what is being damaged / hurt—and then try to protect that thing even more. Then I’m saying that sometimes, when you’ve done that, it doesn’t hurt to do the thing that previously did hurt, but there’s nothing unwholesome here; rather, you’ve healed an unnecessary wound / exposure.
I agree that this is a plausible procedure and sometimes works, but how often do you expect this to work? Is it plausible to you that sometimes you figure out why it’s painful, but that knowledge doesn’t make it less painful, and yet the thing you’re afraid of doing is still the thing you’re supposed to do? Or does this not happen on your model of identity risk and vulnerability?
EDIT: I guess I should mention that I’m aware this is the opposite of what you’re saying, and my understanding is that this is very nearly the opposite of the statement you disclaim at the end here. We agree that people should be able to change their minds, and that sometimes the process of changing one’s mind seems painful. So either people should be able to change their minds despite the risk of pain, or people should be able to rearrange their mind until the process is not painful, and if it’s the latter, then an especially well-arranged mind would be able to do this quickly and would not anticipate pain in the first place. I’m not sure where you disagree with this chain of reasoning and I’m not sure I see where you can.
I think you have the gist, yes, and I think we disagree about the frequency and strength of this harm. If someone I know well told me that they had something vulnerable to share, I’d understand them as saying (modulo different auto-interpretations of mental state) that they’re much more exposed to this specific type of harm than normal in the conversation they expect to follow. Of course other, more solvable forms of vulnerability exist, but the people I’m close to basically know this and know me well enough to know that I also know this, so when they disclose vulnerability, marginal improvements are usually not available. I also think (though I can’t be sure) that this effect is actually quite strong for most people and for many of their beliefs.
I should note: there are contexts where I expect marginal improvements to be available! For example, as a teacher I often need to coordinate make-up exams or lectures with students, and this is often because the students are experiencing things that are difficult to share. When vulnerability is just an obstacle to disclosure, I think I agree with you fully. I don’t think this case is typical of vulnerability.
I guess the last point of disagreement is the claim that this is something most people should try to fortify against over time. More concretely, that most people I interact with should try to fortify against this over time, on the assumptions that you accurately believe that people in your social sphere don’t experience this type of harm strongly, that I accurately believe that people in my social sphere do experience it strongly, and that if you believe most people in your sphere should tone it down, you’d believe so even more strongly for people in my sphere.
For me, this type of fear is a load-bearing component in the preservation of my personal identity, and I suspect that things are similar for most people. I don’t think it’s a coincidence that the rationalist community has very high rates of psychosis and is the only community I’m aware of that treats unusual numbness to this sort of pain as an unalloyed and universal virtue! I think most people would agree that it’s good to be able to change your mind even when it’s painful, especially when it’s painful. But for most communities, the claim that it shouldn’t be painful to change your mind on a certain subject coincides with the claim that that subject shouldn’t be a core pillar of one’s identity. The claim that it shouldn’t be painful to change your mind on any subject, that the pain is basically a cognitive flaw, albeit understandable and forgivable and common, seems unique to this community.
(Also sorry for sentence structure here, I couldn’t figure out how to word this in a maximally-readable way for some reason. Thank you for reading me closely, I appreciate the effort.)
So for example this could recommend noticing the exposure and studying it and sometimes marginally decreasing it, even as you’re still taking it on, if you can’t get rid of it entirely without also jettisoning some other precious stuff.
I agree with this, I think our disagreement is mainly about how much we expect to decrease this exposure before we start jettisoning the precious stuff.
In this example, I would in real life become genuinely curious as to why the belief is comforting, in what manner it is comforting, what would be potentially harmful about having the belief contradicted, and how to avoid that harm.
I agree that this is a reasonable way to avoid some unnecessary risk, but the examples you give seem odd to me. Maintaining beliefs is often comforting because of positionality. A good reasoner should be highly willing to change their mind on anything given the right circumstances, but a good reasoner who found themselves constantly changing their mind and rarely anticipating it would start worrying about hypotheses like “I am fundamentally detached from reality and unable to reliably distinguish truth from fiction” and become quite distraught. I think this is the typical way for beliefs to be comforting and this applies to basically all beliefs, so I don’t think we can expect to avoid at least some amount of harm in most instances of vulnerability. (Of course this consideration is pretty small for most questions of fact! If my friend is wrong about which toppings are available at some pizza place, I don’t expect they would suffer much positional pain from being corrected. But of course most interactions don’t involve meaningful vulnerability from either party, which is why the small vulnerability that does exist is usually not acknowledged.)
In my idioculture, these descriptions are ambiguous between intentions I would consider good and intentions I would consider bad. Roughly, I’d say it’s very important that the action is good / makes sense / is healthy / is wholesome on the concrete object level, without the signaling stuff, in order to be a good signal.
I agree, this is why part (1) was important! Vulnerability can be used incorrectly, I’m not saying that we should pay no attention to the fact that openness induces risk. Indeed, it’s not that hard to describe types of people who consistently misuse vulnerability and cause harm. People can overshare, inappropriately disclosing information about themselves in the hope that their demonstration of vulnerability will produce a social connection, while not valuing the perspective of their counterparty enough to justify the exposure. People can also deceive (or self-deceive!), incorrectly signalling the pain they expect to experience if their perspective is challenged, either to provoke sympathy or demonstrate emotional strength. This just means that we should not be open with everyone, which is why the signal works at all.
I think this is totally deeply incorrect. You can simply invest your efforts to help the other person in a healthy way. Another way is trusting / relying on the other person, including in exposure to risk of harm, when that exposure is required by the task. For example, rock climbing with ropes where you rely on your belayer. Or starting a company, raising a child, etc.
Relying on each other is not the sort of social bond I have in mind here. Rock climbing or starting a business are excellent demonstrations of coincidence of interests or goals, and this produces some sort of social bond, but it’s not the same sort of social bond that vulnerability produces and is not a sufficient replacement, at least in my experience. Helping others and being helped in return, similarly, produces a social bond, but does not replace the need for vulnerability. These can be entryways, and indeed, most close friendships and relationships that I’m aware of began with a coincidence of interests and progressed to joint projects and mutual favors before expressions of vulnerability. But I’m not aware of any close friendships or healthy relationships (in my estimation of what “close” and “healthy” mean) that did not, at some point, involve unguarding, and as far as I can tell, this is where closeness actually begins. Raising a child together can probably produce this type of social bond, but if two (or more I guess) people consistently assess that they’re more scared of the other’s judgment than they are interested in the other’s potentially judgmental opinion on topics they care about, or if they’re only able to solicit the other’s opinion because the other person’s evaluation of them doesn’t feed into their sense of self enough that it could sting, I really really really don’t think those people should raise a child together.
(I guess I should mention the following: of course any starting point can work for any task. If you have enough foresight and are sufficiently good at weighing costs and benefits, you can start by trying to assess the appropriate amount of emotional risk and end up with a perfect policy. However, in this instance, under-risking is much worse than over-risking because it is self-insulating. In my experience, people who are too eager to demonstrate emotional vulnerability get lots of social feedback and settle to a more sustainable and healthy pace pretty quickly. Meanwhile those who are too timid can spend years and decades failing to find friendships that sustain them, and because they less often engage in the vulnerable practice of soliciting outside views from a person they care enough about to take seriously on matters of the self, they often don’t know that things can be different. We agree in principle that some amount of risk is justified but not all risk is justified, and because the evaluation of these quantities varies so much from situation-to-situation, I doubt we’ll be able to sketch out an example of explicit disagreement in enough detail to be fully sure that we disagree about the object-level best policy for the people in the example. The main reason I’m objecting this strongly is that I expect that, to a person who already under-risks, the framing and examples you provide will systematically recommend under-risking. An over-risker might apply the same framing and not end up with the same bias, but I think we should worry much less about how over-riskers will receive our advice on this topic, because over-riskers for the most part don’t need advice.)
I think I can, let me know if this explanation makes sense. (If not then this is probably also the reason I didn’t understand your clarification. Also this ended up pretty long so I probably underexpressed originally, sorry about that.)
What I mean here is that we shouldn’t try to separate openness from vulnerability because openness can’t exist without vulnerability. What do we hope to gain from openness? I think there are basically three answers. We might be embarrassed by our interests, but know that we could benefit from those interests being known to certain people. We might want an outside perspective on a personal matter, because often we’re too close to ourselves to evaluate our situation or our actions reasonably. We might want to make a costly social display, signalling our emotional investment in a particular relationship by demonstrating parts of ourselves that we wouldn’t demonstrate to somebody we weren’t invested in. In practice we’re usually doing a combination of all three of these when we make displays of vulnerability.Which of these can be done without some form of personal risk? We can probably do the first one: in unusually sex-positive communities, for example, people can often disclose their fetishes relatively easily and without much fear of backlash. As a more mundane example, a person might not want to talk about anime with their coworkers, but be excited to talk about it at an anime convention.
The other two I think require risk to be worthwhile. When we seek the counsel of others, not merely their expertise, we are risking the comforting belief that we understand something or are justified in our actions. In an exchange like this:
> I might even respond to “I have something vulnerable to say” with “Oh ok, I’m happy to listen, but also I’d suggest that we could first meditate together for just a bit on what is literally vulnerable about it, circumspectly, and see if we can decrease that aspect”,if I were the other party, I think this response would make it difficult to access openness in the proceeding conversation. When I say “I have something vulnerable to say”, I might mean a few different things, but they’re almost all of the flavor “I want your perspective on a topic where I have trouble trusting my own perspective. It would be temporarily painful for me if your perspective were to differ much from my own, but I find you some combination of safe enough to talk to and insightful enough to be worth talking to that I would like you to give me your true perspective. From this I hope to (1) achieve a better understanding of my own circumstances, even at the cost of being upset for a while, (2) to show you that you’re important to me in a way that I’m not incentivized to fake, and (3) to show you that I am the sort of person who cares about more than just my own perspective. If your perspective does differ significantly from mine, I hope you will be careful but honest when you explain that to me.”
Maybe there is a sort of person who can always get (1) without asking for (2) and (3); that is, can have at least some of the goods without any of the bads! This sort of person would need to be unusually resistant to, perhaps immune to, embarrassment or judgment. They would be able to communicate lots of relevant facts about themselves, even those which other people might hesitate to communicate, and would be perfectly willing to accept a different point of view without even a hint of regret or attachment to their old perspective. But this person wouldn’t be able to make costly signals to demonstrate genuine social connection. I find it hard to imagine what emotional closeness even could look like for a person like this, and I struggle to describe the life I imagine they would lead as one involving anything I recognize as “openness”.
(Moreover, I’m not convinced that this is a way someone can be while still having any sort of personal identity whatsoever. Admittedly I haven’t known many people who tried to be this way, but the couple I have known were extraordinarily emotionally dangerous people, did quite a bit of social manipulation and when confronted seemed unable to understand what social manipulation even is, and ended up suffering mental breaks, although of course I can’t be completely sure of their mental states and can’t speak at all to causation.)
To sum up the most important points: I think deep social bonds (those built out of justified belief in mutual care) are inherently vulnerable. They don’t just coincide with vulnerability, they are made of it. My thoughts and my self-perceptions can cause me pain. If I try to ensure that another person cannot cause me pain, or can cause me as little pain as possible while still giving me whatever social goods I can get risklessly from them, then I’m almost by definition trying to keep them as separate from my thoughts and my self-image as I can, and this seems synonymous with trying not to care about them. There are some social goods that can be had risklessly in certain contexts, and it’s worthwhile to think about how often we want to be in those contexts and how much we value those goods, but the answers should probably be “occasionally” and “not much, relatively”. If we want to be more open and authentic around others and to get more of the social goods we derive from openness and authenticity, then focusing on the evasion of vulnerability is very nearly the worst possible approach.
Maybe you can clarify what you mean by “bad” here? This does not change my understanding of your original comment or my objections to it. I assume by “the following sentences you’re referring to these:
> I assume what people mean is “open, authentic, unguarded, in a way that exposes you to being wounded [which is not a coincidence because being open etc. tends to be exposing]”. That, of course, is a mixed bag; the open authentic part can be good, the literal vulnerability part is bad. This also suggest a general goal-factoring move of figuring out what parts/aspects of the “be vulnerable” action-package are exposing you to harm, and which ones are getting the benefits of “open etc.”, and then trying to contruct action-packages (including mental motions) that get goods + not bads. I don’t have much more to say, except that IME simply having this frame helps with motivation and coordination.
And I think the first of these sentences is ambiguous (if by “tends to be” you mean “must be” then it’s true, otherwise it’s false) and the rest are unambiguously false, especially the claim that this provides more productive motivation for dealing with potentially vulnerable situations.
Thus, logically, vulnerability is bad as such.
This does not follow. Indeed I think it’s wrong. We can decrease vulnerability in some productive ways, for example by establishing emotional stability and establishing support systems, but we cannot get rid of all of it. Put more explicitly, the “openness” benefits we get from vulnerability are precisely the benefits of making ourselves open to some amount of harm, leaning into the true and important fact that because we care about others, the opinions and judgment of others can hurt us. This extension does not factor.
I think we should be careful with the word “is”. Here I think you mean “entails”, which is not the same thing. I also don’t think this is true for all people, but that’s a separate point and I care less about making it.
I see 5% here, with another 7% who say they don’t know? Perhaps we’re looking in different places. In any case, maybe it would have been more precise to say that I have no desire to be a woman despite sharing these experiences and motivations, so I suspect something is missing in this explanation. Good clarification.
Interesting and informative, thank you for sharing. I’ve suspected that something along these lines was a better explanation than AGP and it’s useful to have a detailed/complete version written down.
I think there are two things I should mention though? First, this falls into a broader class of origin theories for transness along the lines of “person has a certain social role, wants a different social role, develops an identity capable of making the journey between those roles, and because gender is complicated adopting that identity may involve taking on a different gender.” This characterization is too broad to be useful for tasks like prediction (relies too much on internality, so less useful externally), but I think it’s basically correct in all cases and gives us the correct sort of path-dependence and prevalance among relative outcasts and age dynamics. It’s also useful as a depathologizing tool: it seems to me, and I could be misreading, that you’re treating the desire to be loved in a certain way as pathological, and I do not think this is a correct way to understand your desires.
The other thing is that this theory must somehow be incomplete for some of the same reasons that AGP must be complete. Namely, as a ~completely cisgender man, I have felt this:
Then, for path-dependent reasons, they sometimes just happen exposed to the concept of transgenderism, specifically in a way that makes it seem like a privileged or socially encouraged strategy for getting the love and acceptance they’ve been deprived of.
(by “this” I mean the yearning to be loved/desired in the way women often are and to be socially championed and treated as inherently deserving of help in the way women often are) pretty strongly for most of my life, and I remember explicitly identifying these as things that trans women might hope to get by transitioning, and I remember encountering transness as an idea and realizing that my social circles were unusually progressive and would likely accept any gender exploration I did. But I had, and still have, no desire to transition.
Maybe the difference here is just age, maybe you were 14 and I was 15 and that extra couple of months of life meant that your self-concept of gender was more fluid than mine, but this seems wrong to me. Of course everything is path-dependent, and if your experiences had been different at 14 maybe this particular ball wouldn’t have started rolling and your identity might be very different today, and the same is true for me. But things are not different, your ball started rolling and mine didn’t. So I suspect that womanhood was appealing to you for reasons other than just solving all of your problems, I don’t think identity-formation is best understood as a problem-solving exercise, and I suspect that (being clearly unusually good at productive introspection) yo may be able to identify the thing that happened for you and not for me, prior to these motivating factors. (Incidentally, if you do identify it I’d love to be informed of that, since I am somewhat less good at introspection and so have not been able to do so.)
I don’t think this is quite what the paper shows. I will need to read more closely to be sure, so I’m not posting this as an answer.
If you know the exact last-token state for an unknown prompt (that is, the probabilities assigned to each possible next token), then just because there are countably many prompts and (abstractly, precision matters some amount here) uncountably many possible end states, in practice we should expect that that last-token state corresponds to only one possible prompt, and we can reverse-engineer what that prompt was without too much difficulty (there is some difficulty, we don’t know prompt length, math is at least a bit hard here but it’s not that hard).
But this doesn’t do what you want it to do: most probability distributions on the next token are not the last-token state for any prompt, so we can’t use this to find magic prompts. The “output” of the model is not just the token it selects, it’s the full set of logits.
I don’t think I understand what you’re saying here, can you rephrase in more words?
This only works if alignment is basically intractable, right? If the problem is basically impossible for normal intelligences, then we should expect that normal intelligences do not generally want to build superintelligences. But if the problem is just out of reach for us, then a machine only slightly smarter than us might crack it. The same is basically true for capabilities.
(I should note that I think this effect is real and underdiscussed.)
Solving alignment usually means one of the following: developing an intelligence recipe which instills the resulting intelligence with arbitrary values+specifying human values well, or developing an intelligence recipe for which the only attractor is within the space of human values. It might be the case that, under current recipes and their nontrivial modifications, there aren’t that many attractors, but because gradient descent is not how human intelligence works, the attractors are not the same as they are for humans. That is, the first system capable of self-improvement might be able to reasonable infer that its successor will share its values, even if it can’t give its successor arbitrary values.