Rana Dexsin

Karma: 667

You see either something special, or nothing special.

Rana Dexsin 24 Dec 2023 0:58 UTC
38 points
35
on: AI Girlfriends Won’t Matter Much

The porno latent space has been explored so thoroughly by human creators that adding AI to the mix doesn’t change much.

Something about this feels off to me. One of the salient possibilities in terms of technology affecting romantic relationships, I think, is hyperspecificity in preferences, which seems like it has a substantial social component to how it evolves. In the case of porn, with (broadly) human artists, the r34 space still takes a substantial delay and cost to translate a hyperspecific impulse into hyperspecific porn, including the cost of either having the skills and taking on the workload mentally (if the impulse-haver is also the artist) or exposing something unusual plus mundane coordination costs plus often commission costs or something (if the impulse-haver is asking a different artist).

With interactively usable, low-latency generative AI, an impulse-haver could not only do a single translation step like that much more easily, but iterate on a preference and essentially drill themselves a tunnel out of compatibility range. No? That seems like the kind of thing that makes an order-of-magnitude difference. Or do natural conformity urges or starting distributions stop that from being a big deal? Or what?

Having written that, I now wonder what circumstances would cause people to drill tunnels toward each other using the same underlying technology, assuming the above model were true…

Rana Dexsin 2 Jun 2019 2:36 UTC
22 points
in reply to: Zvi’s comment on: FB/Discord Style Reacts
My experience in other circles with Slack and Discord is that the niche of emoji reactions is primarily non-interrupting room-sensing (there are also sillier uses in casual social contexts, but they don’t seem relevant here). I don’t feel any pressure to specifically have read something, and I haven’t observed people reading anything into failure to provide a reaction. The rare exception to the latter is when there’s clearly an active conversation going on that someone’s already clearly been active in, which can be handled by explicitly signaling departure, which was a norm in those circumstances anyway.

Non-interrupting room-sensing in a fast-flowing channel environment has generally struck me as beneficial. Being able to quickly find the topic-flow of the current conversation is important, and reactions do not have to be scanned for topic introductions. Reactions encode leafness: you can’t reply to a reaction easily, which also means giving a reaction cannot induce social pressure to reply to it. They encode weaker ties to the individual: people with the same reaction are stacked together, and it takes an extra effort to look at the list of reacting users. Differentially, reactions can also signal level of involvement: someone “conversing” in only reactions may not be up for thinking about the conversation hard enough to produce text responses, but is able to listen and give base emotional feedback (which seems to be the most relevant to the proposed uses here). It serves a similar function to scanning people’s facial expressions in a physical meeting room.

I’m very unclear on how these patterns would play out in a longer-form, more delay-tolerant environment like a comment tree. Some of the room-sensing interpretation makes less sense the less the timescale of the reactions corresponds to unconscious-emotion synchronization; there’s a lot of lost flow context.

Rana Dexsin 23 Jun 2020 21:22 UTC
19 points
in reply to: Rudi C’s comment on: SlateStarCodex deleted because NYT wants to dox Scott
This initially felt to me like it ignored some of the ramifications of its parent comment, but I’m also not sure the parent comment intended to imply them. So I would like to put forth the more specific idea that the line of action “there is a power imbalance, therefore, we have to amplify our motions by a large factor to counteract it, which is safe because we know we can’t do any real damage to them” may not be universally wrong but is still dangerous and, for those acting on the sort of charitability norms ESRogs/ricraz describe, requires a lot of extra scrutiny. Specifically, I think nonrigorously with medium confidence that:
- This line of action can create a violence cascade if some of the assumptions are wrong. (And in this concrete context specifically, it is not clear to me that the assumptions are right enough.)
- In the case of “soft power” (as opposed to, for instance, physical violence, where damage is more readily objectively measurable and is often decisive by way of shutting down capacity), this is much more true when there is a lot of “fog of war” going on, where perceptions of who has power over what and whom don’t have a lot of consensus. It is very easy to assume you’re in the weak position when you actually have more power than you think, and even if that power is only in some spheres, it can do lasting damage.
- Some of the possible lasting damage is polarization cascades which operate independently of whether you can damage someone’s reputation in the “mainstream”: if each loosely-defined party over-updates on decrements to an opposing party’s reputation just among itself, this opens up a positive feedback loop.
- In the case of decentralized Internet communities, it’s hard to tell how large the amplification factor is actually going to be unless there’s actually a control loop involved (such as a leader with the social credentials to say “our demands have been met, now we will stop shouting”).
- In the presence of the ability of soft-power actions to “go viral” quickly and out of control from tiny sources, unilateralist’s curse amplifies all of the above for even very localized decisions about when to “put the hurt on”.
I think with less confidence that the existing polarization cascades across the Internet involve a growing memetic strain that incentivizes strategic perception of self as weak in the public sphere, so there’s some amount of “if you think you’re in the weak position and should hit back, it might also be your corrupted hardware emulating status-acquiring behavior” in there too.

At this point the specific SSC articles “Be Nice, At Least Until You Can Coordinate Meanness” and “The Toxoplasma of Rage” come to mind, but I don’t remember clearly enough whether they directly support any of this, and given Scott’s current position, I don’t feel like it would be appropriate for me to try to check directly.

I do think there are plausibly more concrete points against a “mistake theory”-like interpretation of the events. For instance, Scott reported the reporter describing that there was an NYT policy, and others say this is not actually true. But the reporter could have misspoken, which would still be a legitimate grievance against the reporter, but frames it in a different light. Or Scott could have subtly misrepeated the information; I am sure he tries to be careful, but does he get every such fact exactly right under the large stresses of an apparent threat?

So, I generally endorse “tread cautiously here”.

I also think Scott’s own suggestions of sending polite, private feedback to the NYT expressing disapproval of revealing Scott’s name are not unusually dangerous and do not have much potential for creating cascading damage per above, especially since “news organizations should be able to deal with floods of private feedback” is a well-established norm. So this shouldn’t be interpreted as a reason to suppress that.

Rana Dexsin 23 Nov 2020 9:04 UTC
18 points
on: Survey of Deviant Ideas
Isn’t this extremely social-context-dependent? Do you mean “almost no other LW readers would agree with you on”? Or “almost nobody in the (poorly-defined) ‘mainstream’ would agree with you on”? Or “almost nobody in your ‘primary’ social group (whatever that is) would agree with you on”? Or “almost nobody in the world (to what threshold? that’s a lot of people!) would agree with you on”?

Edited to add: To make the concrete connection explicit, I can think of a number of things I believe that I wouldn’t dare say out loud on LW, and a number of things I believe that I wouldn’t dare say out loud in another very different social setting I’m attached to, but they don’t intersect much. I’m not sure I can think of much I believe where I have no social group that would agree with me.

Rana Dexsin 8 Dec 2023 11:57 UTC
16 points
2
in reply to: habryka’s comment on: Open Thread – Winter 2023/2024
The Latin noun “instauratio” is feminine, so “magna” uses the feminine “-a” ending to agree with it. “forum” in Latin is neuter, so “magnum” would be the corresponding form of the adjective. (All assuming nominative case.)

Rana Dexsin 7 Apr 2021 3:28 UTC
15 points
in reply to: jimrandomh’s comment on: IQcaptcha enters beta
The tweet example indicated as “blocked” also points way past “offensive satire” to me; the description of “I can’t use this shampoo” is charitably read as pointing toward a real difference in hair-care needs which isn’t being covered by a business, plus some vent-driven/antagonistic emotional content. That’s not “unintelligent”, that’s more like “exhibiting conflict or cultural markers in a way that makes you uncomfortable”, and it aligns with culture war in an alarming way. (Of course, there can exist sites where posting such things is off-topic or otherwise outside the norm, but displaying it as connected to the ostensible purpose reads as trying to sneak in a wild claim, and the choice of example is bizarre to begin with.)

I notice that ‘ballerburg9005’ only joined today and this is their only post. My probability that this is being posted in good faith is quite low given the above. I have strong-downvoted the post.

Rana Dexsin 2 Nov 2023 7:31 UTC
14 points
6
on: Public Weights?
Something I haven’t yet personally observed in threads on this broad topic is the difference in risk modeling from the perspective of the potential malefactor. You note that outside a hackathon context, one could “take a biology class, read textbooks, or pay experienced people to answer your questions”—but especially that last one has some big-feeling risks associated with it. What happens if the experienced person catches onto what you’re trying to do, stops answering questions, and alerts someone? The biology class is more straightforward, but still involves the risky-feeling action of talking to people and committing in ways that leave a trail. The textbooks have the lowest risk of those options but also require you to do a lot more intellectual work to get from the base knowledge to the synthesized form.

This restraining effect comes only partly in the form of real social risks to doing things that look ‘hinky’, and much more immediately in the form of psychological barriers from imagined such risks. People who are of the mindset to attempt competent social engineering attacks often report them being surprisingly easy, but most people are not master criminals and shy away from doing things that feel suspicious by reflex.

When we move to the LLM-encoded knowledge side of things, we get a different risk profile. Using a centralized, interface-access-only LLM involves some social risk to a malefactor via the possibility of surveillance, especially if the surveillance itself involves powerful automatic classification systems. Content policy violation warnings in ChatGPT are a very visible example of this; many people have of course posted about how to ‘jailbreak’ such systems, but it’s also possible that there are other hidden tripwires.

For an published-weights LLM being run on local, owned hardware through generic code that’s unlikely to contain relevant hidden surveillance, the social risk to experimenting drops into negligible range, and someone who understands the technology well enough may also understand this instinctively. Getting a rejection response when you haven’t de-safed the model enough isn’t potentially making everyone around you more suspicious or adding to a hidden tripwire counter somewhere in a Microsoft server room. You get unlimited retries that are punishment-free from this psychological social risk modeling perspective, and they stay punishment-free pretty much up until the point where you start executing on a concrete plan for harm in other ways that are likely to leave suspicious ripples.

Structurally this feels similar to untracked proliferation of other mixed-use knowledge or knowledge-related technology, but it seems worth having the concrete form written out here for potential discussion.

This is the main driving force behind why my intuition agrees with you that the accessibility of danger goes up a lot with a published-weights LLM. Emotionally I also agree with you that it would be sad if this meant it were too dangerous to continue open distribution of such technology. I don’t currently have a well-formed policy position based on any of that.

Rana Dexsin 15 Jan 2023 15:23 UTC
14 points
6
in reply to: Eli Tyre’s comment on: How it feels to have your mind hacked by an AI
This was and is already true to a lesser degree with manipulative digital socialization. The less of your agency you surrender to network X, the more your friends who have given their habits to network X will be able to work at higher speed and capacity with each other and won’t bother with you. But X is often controlled by a powerful and misaligned entity.

And of course these two things may have quite a lot of synergy with each other.

Rana Dexsin 15 Jan 2023 11:52 UTC
14 points
1
in reply to: CalebZZZ’s comment on: How it feels to have your mind hacked by an AI

As an autistic person, I’ve always kinda felt like I was making my way through life by predicting how a normal person would act.

I would tend to say that ‘normal’ people also make their way through life by predicting how normal people would act, trained by having observed a lot of them. That’s what (especially childhood) socialization is. Of course, a neurotypical brain may be differently optimized for how this information is processed than other types of brains, and may come with different ‘hooks’ that mesh with the experience in specific ways; the binding between ‘preprogrammed’ instinct and social conditioning is poorly understood but clearly exists in a broad sense and is highly relevant to psychological development.

Separately, though:

And I seriously had to stop and think about all 3 of these responses for hours. It is wild how profound these AI manage to be, just from reading my message.

Beware how easy it is to sound Deep and Wise! This is especially relevant in this context since the tendency to conflate social context or framing with the inner content of a message is one of the main routes to crowbarring minds open. These are similar to Daniel Dennett’s “deepities”. They are more like mirrors than like paintings, if that makes any sense—and most people when confronted with the Voice of Authority have an instinct to bow before the mirror. (I know I have that instinct!) But also, I am not an LLM (that I am aware of) and I would guess that I can come up with a nearly unlimited amount of these for any situation that are ultimately no more useful than as content-free probes. (In fact I suspect I have been partially trained to do so by social cues around ‘intelligence’, to such an extent that I actively suppress it at times.)

Rana Dexsin 8 Mar 2022 6:26 UTC
14 points
on: America’s Invisible Graveyard: Understanding the Moral Implications of Western sanctions

I have not looked into their methodology, and the 40,000 number may be wildly inflated. However, that its even plausible that U.S. sanctions could cause 40,000 deaths in Venezuela over the course of one year speaks to the disastrous humanitarian consequences American sanctions can have.

No, hang on. You can’t do that. That’s a classic backtrack to a dangling justification: “I don’t know whether it’s true, but doesn’t the part where I thought it seemed like it might be mean something kind of similar?” No, not really.

There’s a lot of other hyperbolic description here too that seems to be poorly justified and leans heavily on the “you are probably not being serious if you don’t think this already” tone. Doesn’t mean it’s false necessarily either, but this is sketchy.

Rana Dexsin 1 Jan 2021 5:41 UTC
14 points
on: Covid 12/31: Meet the New Year
The WHO redefinition part looked weird to me, so I tried to verify it. The 13 November text verifies at the Internet Archive—though note that the text shown in the screenshot is only the beginning of the entry. The entry contained many more paragraphs of text, but I don’t see it correcting the weird definition of “herd immunity” that it establishes at the beginning.

However, the current text as I am seeing it live on 31 December (last update today, apparently) is significantly different; it gives a lot of space to the benefits of vaccination, but does not phrase it in such a way as to ignore other immunity sources the way the 13 November text did, and makes it clearer that the “herd immunity through vaccination” is a normative claim on actions that should be taken and not a positive or nominative claim on what herd immunity actually is. Here’s the current first paragraph, emphasis mine:

‘Herd immunity’, also known as ‘population immunity’, is the indirect protection from an infectious disease that happens when a population is immune either through vaccination or immunity developed through previous infection. WHO supports achieving ‘herd immunity’ through vaccination, not by allowing a disease to spread through any segment of the population, as this would result in unnecessary cases and deaths.

The rest of the new text more or less matches this change from the 13 November version; there is a bit about “The fraction of the population that must be vaccinated against COVID-19 to begin inducing herd immunity is not known”, but that’s several paragraphs in and I read it as pretty well-contextualized to “given that the plan is to vaccinate until we reach that point”. Here’s the first sentence from the third paragraph, emphasis mine:

Vaccines train our immune systems to create proteins that fight disease, known as ‘antibodies’, just as would happen when we are exposed to a disease but – crucially – vaccines work without making us sick.

The part I emphasized in that sentence is actually identical in the 13 November text, but badly contextualized. (The other differences in the third paragraph are immaterial to the distinction under question, consisting only of additional explanatory text—I assume to help readers who don’t have a basic gears-model of immune response and viral transmission readily in memory.)

Importantly, and to restate something from above, the third and all subsequent paragraphs are missing from the right-hand screenshot in the post, and it doesn’t look like normal truncation at a glance—the whitespace at the bottom of the screenshot visually implies that the second paragraph was the end of the entry in that version, which is false.

IA snapshots show that the 13 November text was in place up through 27 December, so perhaps not a small blip in terms of Internet time, but it does seem to have been corrected.

I was not able to verify the 9 June text, since IA shows no snapshots of this URL before October. I imagine perhaps the URL was different, and I would appreciate a hard reference if anyone has one.

Rana Dexsin 10 Mar 2022 11:23 UTC
13 points
in reply to: Alexander Gietelink Oldenziel’s comment on: It Looks Like You’re Trying To Take Over The World
There is a deleted comment parent to dxu’s which is not very obvious in the interface due to being represented by a single arrow glyph.

Rana Dexsin 23 Dec 2020 9:46 UTC
12 points
on: 100 Tips for a Better Life
[Epistemic status: experience-based synthesis, likely biased]

Most of these seem reasonably sane, of course with varying levels of cultural and situational slant and specificity (as one would expect from any list like this). One of them, however, strikes me as actively dangerous in a way worth mentioning:
1. If you want to become funny, try just saying stupid shit until something sticks.
Doing this visibly in more sensitive or conformist social groups can be a disaster. Gaining a reputation for saying erratic things can make you the person that no one can take anywhere because you might ruin the environment at any time, and then you’re in the hole. Depending on your interpersonal goals, it may be that exiting a group like that would be a net benefit for you, but even if that’s true for you, you may want to examine those options first before playing roulette with your status.

Bouncing things off yourself doesn’t have the same problem, but seems like a much weaker way of developing a quality which is fundamentally social; it can work if you have an internal sense of what’s funny but haven’t “found” it for conscious access, but it doesn’t work if you were miscalibrated to start with. Bouncing things off trusted friends can work, but at that point you’re more likely to have already had that option saliently in mind. (Well, if you didn’t and you’re reading this, now you do.)

More specifically, I think people who are socially oblivious and think that humor will improve their standing may be likely to jump at 52, and if they are in the above situation, get hurt, with the hazard having been invisible due to the obliviousness. One might then ask why they would get marginally hurt if they were already likely to make social errors—but I think it’s possible to get by in such cases with (perhaps not consciously noticed) conditioned broad inhibitions instead… until you read something like this as “permission”.

Rana Dexsin 3 Mar 2023 14:59 UTC
11 points
1
in reply to: Trevor Cappallo’s comment on: Sydney can play chess and kind of keep track of the board state
Long before we get to the “LLMs are showing a number of abilities that we don’t really understand the origins of” part (which I think is the most likely here), a number of basic patterns in chess show up in the transcript semi-directly depending on the tokenization. The full set of available board coordinates is also countable and on the small side. Enough games and it would be possible to observe that “. N?3” and “. N?5” can come in sequence but the second one has some prerequisites (I’m using the dot here to point out that there’s adjacent text cues showing which moves are from which side), that if there’s a “0-0” there isn’t going to be a second one in the same position later, that the pawn moves “. ?2” and “. ?1” never show up… and so on. You could get a lot of the way toward inferring piece positions by recognizing the alternating move structure and then just taking the last seen coordinates for a piece type, and a layer of approximate-rule-based discrimination would get you a lot further than that.

Rana Dexsin 30 Mar 2022 6:00 UTC
11 points
in reply to: aphyer’s comment on: Do a cost-benefit analysis of your technology usage
That in turn is actually dependent on whether having your ambient thoughts occupied by YouTube is better overall than having them occupied by nothing for a while. There’s a lot of valuable background processing that I suspect gets starved by constant stimulation. Of course, carving out explicit time for reflection or for a meditation practice or similar is also something one can do.

Rana Dexsin 9 Jun 2021 3:42 UTC
11 points
on: Bad names make you open the box

The term “regress” sounds like it means “move down”, but instead it just means “move closer to”.

It means “return to(ward)”, with the implication that the observed difference from the mean is (partially) transient, so you’re returning to a past state. An example of why it sometimes implies “worsen” or “decrease” is that in a developmental context, most of the relevant change over time is assumed to be improvement, so a regression is by default a return to a lesser or worse state. This doesn’t necessarily invalidate what you said about it in a broader way, but that’s how the association comes out in my mind.

Rana Dexsin 5 Feb 2023 3:15 UTC
10 points
3
on: SolidGoldMagikarp (plus, prompt generation)
“ForgeModLoader” has an interestingly concrete plausible referent in the loader component of the modding framework Forge for Minecraft. I believe in at least some versions its logfiles are named beginning with that string exactly, but I’m not sure where else that appears exactly (it’s often abbreviated to “FML” instead). “FactoryReloaded” also appears prominently in the whitespace-squashed name (repository and JAR file names in particular) of the mod “MineFactory Reloaded” which is a Forge mod. I wonder if file lists or log files were involved in swinging the distribution of those?

Rana Dexsin 3 Apr 2022 18:36 UTC
10 points
in reply to: Ben Pace’s comment on: Good Heart Week: Extending the Experiment

You could say “why would you connect the playful and the serious” and I’d be like “they’re the same person, this is how they think, their character comes across when they play”.

This feels close to a crux to me. Compare: if you were in a theater troupe, and someone preferred to play malicious characters, would you make the same judgment?

So, it’s not a question of “playful” versus “serious” attitudes, but of “bounded by fiction” versus “executed in reality”. The former is allowed to leak into the latter in ways that are firmly on the side of nondestructive, so optional money handouts in themselves don’t result in recoil. But when that unipolar filter is breached, such as when flip-side consequences like increased moderator scrutiny also arrive in reality, not having a clear barrier where you’ve applied the same serious consideration that the real action would receive feels like introducing something adverse under false pretenses. (There is some exception made here for psychological consequences of e.g. satire.)

The modern April Fools’ tradition as I have usually interpreted it implies that otherwise egregious-seeming things done on April Fools’ Day are expected to be primarily fiction, with something like the aforementioned unipolar liminality to them.

Similarly, I think there’s something silly/funny about making good heart tokens and paying for them on April First. And yet, if someone tries to steal them, I will think of that as stealing.

Combining this with the above, I would predict TLW to be much less disturbed by a statement of “for the purpose of Good Heart tokens, we will err on the broad side in terms of non-intrusively detecting exploitative behavior and disallowing monetary redemption of tokens accumulated in such a way, but for all other moderation purposes, the level of scrutiny applied will remain as it was”. That would limit any increase in negative consequences to canceling the positive individual consequences “leaking out of” the experiment.

The other and arguably more important half of things here is that the higher-consequence action has been overlaid onto an existing habitual action in an invasive way. If you were playing a board game, moving resource tokens to your area contrary to the rules of the game might be considered antisocial cheating in the real world. However, if the host suddenly announced that the tokens in the game would be cashed out in currency and that stealing them would be considered equivalent to stealing money from their purse, while the game were ongoing, I would expect some people to get up and leave, even if they weren’t intending to cheat, because the tradeoff parameters around other “noise” risks have suddenly been pulled out from underneath them. This is as distinct from e.g. consciously entering a tournament where you know there will be real-money prizes, and it’s congruent with TLW’s initial question about opting out.

For my part, I’m not particularly worried (edit: on a personal level), but I do find it confusing that I didn’t see an explicit rule for which votes would be part of this experiment and which wouldn’t. My best guess is that it applies when both the execution of the vote and the creation of its target fall within the experiment period; is that right?

Rana Dexsin 5 Feb 2023 5:08 UTC
9 points
2
on: Misleading Fast Charging Specs
On the Kia EV6 page you link first, I think the position of the 350 kW value you quoted being part of the initial conditions rather than an expected draw is pretty clear. The interpretation I’m pointing at is “if connected to a charger with a capacity of 350 kW, the expected time is approximately 18 minutes”—the 350 kW is on the LHS of the conditional as signaled by its position in the text. By comparison to nearby text, the entry immediately above the one you quoted states 73 minutes under the condition of being connected to a Level 3 charger (the same fast-DC type) with a lower capacity tier of 50 kW, and other entries above that one display corresponding “if provided with” → “duration will be” pairs for weaker and easier-to-install charging pathways, down to household AC taking the longest duration. This would make no sense under an interpretation of each of the wattage values all reflecting the battery’s internal acceptance rate. Note that the text as you quoted it is not visible to me as a straight-running block of plain text in the page linked; instead, “DC Fast Charge Time (10-80% @ 350 kW via Electric Vehicle Supply Equipment) Level 3 Charger” is the opening line of an expandable box whose body content is “Approx. 18 min.”, and that interaction/presentation provides more clarity to the conditional structure.

The Tesla Model 3 page states “max” for its 250 kW figure, whereas the P3 page is clear that the car “only achieves an average charging power of 146 kW” (emphasis mine), and the associated graph does show 250 kW being drawn at the initial charge state of 10%, then decreasing as the battery charge increases.

The Hyundai Kona and IONIQ pages are similar to the Kia EV6 page: the kilowatts are in the if-part of the conditional, and the measurements listed in the body cells to be relied on as outputs are the minutes.

The VW ID.3 brochure I also read as having the kilowatts in the if-part, though the text is laid out less clearly. Also, I only see your 125 kW figure mentioned in the context of the 77 kWh battery option whereas the P3 report specifies that they used one of the 58 kWh battery options.

In general, what information flow will the median consumer use? Not “do a bunch of division that’s going to be wrong because of other variability in how batteries work”. “Show me how long it takes with this type of charger” is the information that’s closest to their planning needs. The Tesla page is unusual here compared to the others, but “will a higher-capacity charging feed than X reduce my charging time” is a plausible second most relevant question and is answered better by the max than by the average (if you provide less than a 250 kW capacity, some part of the charge curve will take extended time by comparison to the nominal one).

Rana Dexsin 15 Apr 2022 4:35 UTC
9 points
in reply to: sudo’s comment on: My least favorite thing
Interesting. I’ll want to look back at this later; it seems like I partially missed the point of your original post, but also it seems like there are potentially productive fuzzy conversations to be had more broadly?

To one aspect, and sorry in advance if I get rambly since I don’t have much time to edit down:

I’m not quite sure what cycles you were referring to (do you have examples?),

In short: the location/finance/legibility spiral, the employment/employment/legibility spiral, and the enormous energy needed to get back up if you fall down a class-marker level in enough ways. I don’t think I can expand on that without getting into the personal-ish version, so I’ll just go ahead and let you adjust/weaken for perspective. There’s a lot of potential fog of “which bits of world are accessible” here (but then, that’s to some degree part of the phenomenon I’m gesturing at, too).

Preamble: if you care about being influential then your problems skew a lot more toward a social-reality orientation than if you primarily care about doing good work in a more purely abstract sense. I decided long ago for myself that not caring enough about being influential in a more direct sense was likely to create problems with misinterpretation and value skew where even if I did work that had a shot at making an impact on the target, the effective result of any popularization of it might not be something I could meaningfully steer. In particular, this means I don’t expect the “live cheaply somewhere remote and put out papers while doing almost all my collaboration electronically” approach to work, at least at this point in my career.

Caveat: currently, I think I’ve likely overshot in terms of mindset for medium-term benefit even in terms of social reality (mostly due to risk aversion of the kind you disapprove of and due to the way a bunch of signaling is anti-inductive). I am deeply conflicted as to how much to backtrack on or abandon.

First cycle: Several social and career needs might be better met by moving to a specific place. That place has a high cost of living due to big-city amplification effects, which is a big obstacle in itself—but it’s not just the cost, but things like default tenant-landlord relationships and the signaling involved in that. It’s having the pay stub so you can qualify to rent housing, and having that pay stub come from the right place, and so on. Ability to work around this is limited; alternative documentation usually requires an order of magnitude longer of demonstrated, documented stability, and gaining access via local networks of people has a bootstrapping problem.

Second cycle: I see a lot of talk around some labor markets (especially in software work, which seems very common in this social sphere) currently being heavily worker-tilted, but I’ve still not seen much way to get in on skills alone, especially because it’s not just technical skill, it’s the remaining 90% of the work that involves having practiced collaborating and taking roles in an organization in the ‘right’ way so they don’t have to completely train you up for that. There’s plenty of market for people with three years of legible, verifiable, full-time experience, and almost nothing otherwise. This is classic “you need a job to get a job”, and if your existing role is of the wrong kind, you’re on the treadmill of that alternate track and need a massive pile of slack to switch around.

The above two amplify each other a lot, because the former of them gives you a lot of random-chance opportunity to try to get past barriers to the latter and the latter gets you the socioeconomic legibility for the former. For some corroboration, Patrick McKenzie talks about hiring in the software industry: (1), (2), (3) with some tactics for how to work within this. He specifically notes in (3) that “plausible” is essentially binary and recommends something congruent with your “It’s probably easier for me to de-wheel at the current point, already having some signalling tools, then it is for the average person to de-wheel.” in terms of getting past a threshold first (which is similar to the type of advice you get upset at in the OP).

Now, if you’re talking from purely an alignment perspective, and most work in alignment is currently theoretical and doesn’t benefit much from the above, and organizations funding it and people doing it manage to avoid adhesion to similar phenomena in selection, then you have a much better case for not caring much.

I’m personally a STEM student at a fancy college with a fancy (non-alignment) internship lined up.

Being a student is notably a special case that gets you a lot of passes, because that’s the perceived place in life where you’re ‘supposed to’ not have everything yet. Once you’re past the student phase, you get very little slack. This is especially true in terms of lack of external-system slack in mentoring/integration capacity—the above induction into the ‘right kind’ of experience is slack that is explicitly given to interns, but then selecting for anyone else alongside that is expensive, so if they can fill all their intake needs from student bodies, and you don’t have the “I am a student” pass yourself, you lose by default.