You see either something special, or nothing special.
Rana Dexsin
My experience in other circles with Slack and Discord is that the niche of emoji reactions is primarily non-interrupting room-sensing (there are also sillier uses in casual social contexts, but they don’t seem relevant here). I don’t feel any pressure to specifically have read something, and I haven’t observed people reading anything into failure to provide a reaction. The rare exception to the latter is when there’s clearly an active conversation going on that someone’s already clearly been active in, which can be handled by explicitly signaling departure, which was a norm in those circumstances anyway.
Non-interrupting room-sensing in a fast-flowing channel environment has generally struck me as beneficial. Being able to quickly find the topic-flow of the current conversation is important, and reactions do not have to be scanned for topic introductions. Reactions encode leafness: you can’t reply to a reaction easily, which also means giving a reaction cannot induce social pressure to reply to it. They encode weaker ties to the individual: people with the same reaction are stacked together, and it takes an extra effort to look at the list of reacting users. Differentially, reactions can also signal level of involvement: someone “conversing” in only reactions may not be up for thinking about the conversation hard enough to produce text responses, but is able to listen and give base emotional feedback (which seems to be the most relevant to the proposed uses here). It serves a similar function to scanning people’s facial expressions in a physical meeting room.
I’m very unclear on how these patterns would play out in a longer-form, more delay-tolerant environment like a comment tree. Some of the room-sensing interpretation makes less sense the less the timescale of the reactions corresponds to unconscious-emotion synchronization; there’s a lot of lost flow context.
This initially felt to me like it ignored some of the ramifications of its parent comment, but I’m also not sure the parent comment intended to imply them. So I would like to put forth the more specific idea that the line of action “there is a power imbalance, therefore, we have to amplify our motions by a large factor to counteract it, which is safe because we know we can’t do any real damage to them” may not be universally wrong but is still dangerous and, for those acting on the sort of charitability norms ESRogs/ricraz describe, requires a lot of extra scrutiny. Specifically, I think nonrigorously with medium confidence that:
This line of action can create a violence cascade if some of the assumptions are wrong. (And in this concrete context specifically, it is not clear to me that the assumptions are right enough.)
In the case of “soft power” (as opposed to, for instance, physical violence, where damage is more readily objectively measurable and is often decisive by way of shutting down capacity), this is much more true when there is a lot of “fog of war” going on, where perceptions of who has power over what and whom don’t have a lot of consensus. It is very easy to assume you’re in the weak position when you actually have more power than you think, and even if that power is only in some spheres, it can do lasting damage.
Some of the possible lasting damage is polarization cascades which operate independently of whether you can damage someone’s reputation in the “mainstream”: if each loosely-defined party over-updates on decrements to an opposing party’s reputation just among itself, this opens up a positive feedback loop.
In the case of decentralized Internet communities, it’s hard to tell how large the amplification factor is actually going to be unless there’s actually a control loop involved (such as a leader with the social credentials to say “our demands have been met, now we will stop shouting”).
In the presence of the ability of soft-power actions to “go viral” quickly and out of control from tiny sources, unilateralist’s curse amplifies all of the above for even very localized decisions about when to “put the hurt on”.
I think with less confidence that the existing polarization cascades across the Internet involve a growing memetic strain that incentivizes strategic perception of self as weak in the public sphere, so there’s some amount of “if you think you’re in the weak position and should hit back, it might also be your corrupted hardware emulating status-acquiring behavior” in there too.
At this point the specific SSC articles “Be Nice, At Least Until You Can Coordinate Meanness” and “The Toxoplasma of Rage” come to mind, but I don’t remember clearly enough whether they directly support any of this, and given Scott’s current position, I don’t feel like it would be appropriate for me to try to check directly.
I do think there are plausibly more concrete points against a “mistake theory”-like interpretation of the events. For instance, Scott reported the reporter describing that there was an NYT policy, and others say this is not actually true. But the reporter could have misspoken, which would still be a legitimate grievance against the reporter, but frames it in a different light. Or Scott could have subtly misrepeated the information; I am sure he tries to be careful, but does he get every such fact exactly right under the large stresses of an apparent threat?
So, I generally endorse “tread cautiously here”.
I also think Scott’s own suggestions of sending polite, private feedback to the NYT expressing disapproval of revealing Scott’s name are not unusually dangerous and do not have much potential for creating cascading damage per above, especially since “news organizations should be able to deal with floods of private feedback” is a well-established norm. So this shouldn’t be interpreted as a reason to suppress that.
Isn’t this extremely social-context-dependent? Do you mean “almost no other LW readers would agree with you on”? Or “almost nobody in the (poorly-defined) ‘mainstream’ would agree with you on”? Or “almost nobody in your ‘primary’ social group (whatever that is) would agree with you on”? Or “almost nobody in the world (to what threshold? that’s a lot of people!) would agree with you on”?
Edited to add: To make the concrete connection explicit, I can think of a number of things I believe that I wouldn’t dare say out loud on LW, and a number of things I believe that I wouldn’t dare say out loud in another very different social setting I’m attached to, but they don’t intersect much. I’m not sure I can think of much I believe where I have no social group that would agree with me.
The Latin noun “instauratio” is feminine, so “magna” uses the feminine “-a” ending to agree with it. “forum” in Latin is neuter, so “magnum” would be the corresponding form of the adjective. (All assuming nominative case.)
The tweet example indicated as “blocked” also points way past “offensive satire” to me; the description of “I can’t use this shampoo” is charitably read as pointing toward a real difference in hair-care needs which isn’t being covered by a business, plus some vent-driven/antagonistic emotional content. That’s not “unintelligent”, that’s more like “exhibiting conflict or cultural markers in a way that makes you uncomfortable”, and it aligns with culture war in an alarming way. (Of course, there can exist sites where posting such things is off-topic or otherwise outside the norm, but displaying it as connected to the ostensible purpose reads as trying to sneak in a wild claim, and the choice of example is bizarre to begin with.)
I notice that ‘ballerburg9005’ only joined today and this is their only post. My probability that this is being posted in good faith is quite low given the above. I have strong-downvoted the post.
Something I haven’t yet personally observed in threads on this broad topic is the difference in risk modeling from the perspective of the potential malefactor. You note that outside a hackathon context, one could “take a biology class, read textbooks, or pay experienced people to answer your questions”—but especially that last one has some big-feeling risks associated with it. What happens if the experienced person catches onto what you’re trying to do, stops answering questions, and alerts someone? The biology class is more straightforward, but still involves the risky-feeling action of talking to people and committing in ways that leave a trail. The textbooks have the lowest risk of those options but also require you to do a lot more intellectual work to get from the base knowledge to the synthesized form.
This restraining effect comes only partly in the form of real social risks to doing things that look ‘hinky’, and much more immediately in the form of psychological barriers from imagined such risks. People who are of the mindset to attempt competent social engineering attacks often report them being surprisingly easy, but most people are not master criminals and shy away from doing things that feel suspicious by reflex.
When we move to the LLM-encoded knowledge side of things, we get a different risk profile. Using a centralized, interface-access-only LLM involves some social risk to a malefactor via the possibility of surveillance, especially if the surveillance itself involves powerful automatic classification systems. Content policy violation warnings in ChatGPT are a very visible example of this; many people have of course posted about how to ‘jailbreak’ such systems, but it’s also possible that there are other hidden tripwires.
For an published-weights LLM being run on local, owned hardware through generic code that’s unlikely to contain relevant hidden surveillance, the social risk to experimenting drops into negligible range, and someone who understands the technology well enough may also understand this instinctively. Getting a rejection response when you haven’t de-safed the model enough isn’t potentially making everyone around you more suspicious or adding to a hidden tripwire counter somewhere in a Microsoft server room. You get unlimited retries that are punishment-free from this psychological social risk modeling perspective, and they stay punishment-free pretty much up until the point where you start executing on a concrete plan for harm in other ways that are likely to leave suspicious ripples.
Structurally this feels similar to untracked proliferation of other mixed-use knowledge or knowledge-related technology, but it seems worth having the concrete form written out here for potential discussion.
This is the main driving force behind why my intuition agrees with you that the accessibility of danger goes up a lot with a published-weights LLM. Emotionally I also agree with you that it would be sad if this meant it were too dangerous to continue open distribution of such technology. I don’t currently have a well-formed policy position based on any of that.
This was and is already true to a lesser degree with manipulative digital socialization. The less of your agency you surrender to network X, the more your friends who have given their habits to network X will be able to work at higher speed and capacity with each other and won’t bother with you. But X is often controlled by a powerful and misaligned entity.
And of course these two things may have quite a lot of synergy with each other.
As an autistic person, I’ve always kinda felt like I was making my way through life by predicting how a normal person would act.
I would tend to say that ‘normal’ people also make their way through life by predicting how normal people would act, trained by having observed a lot of them. That’s what (especially childhood) socialization is. Of course, a neurotypical brain may be differently optimized for how this information is processed than other types of brains, and may come with different ‘hooks’ that mesh with the experience in specific ways; the binding between ‘preprogrammed’ instinct and social conditioning is poorly understood but clearly exists in a broad sense and is highly relevant to psychological development.
Separately, though:
And I seriously had to stop and think about all 3 of these responses for hours. It is wild how profound these AI manage to be, just from reading my message.
Beware how easy it is to sound Deep and Wise! This is especially relevant in this context since the tendency to conflate social context or framing with the inner content of a message is one of the main routes to crowbarring minds open. These are similar to Daniel Dennett’s “deepities”. They are more like mirrors than like paintings, if that makes any sense—and most people when confronted with the Voice of Authority have an instinct to bow before the mirror. (I know I have that instinct!) But also, I am not an LLM (that I am aware of) and I would guess that I can come up with a nearly unlimited amount of these for any situation that are ultimately no more useful than as content-free probes. (In fact I suspect I have been partially trained to do so by social cues around ‘intelligence’, to such an extent that I actively suppress it at times.)
I have not looked into their methodology, and the 40,000 number may be wildly inflated. However, that its even plausible that U.S. sanctions could cause 40,000 deaths in Venezuela over the course of one year speaks to the disastrous humanitarian consequences American sanctions can have.
No, hang on. You can’t do that. That’s a classic backtrack to a dangling justification: “I don’t know whether it’s true, but doesn’t the part where I thought it seemed like it might be mean something kind of similar?” No, not really.
There’s a lot of other hyperbolic description here too that seems to be poorly justified and leans heavily on the “you are probably not being serious if you don’t think this already” tone. Doesn’t mean it’s false necessarily either, but this is sketchy.
The WHO redefinition part looked weird to me, so I tried to verify it. The 13 November text verifies at the Internet Archive—though note that the text shown in the screenshot is only the beginning of the entry. The entry contained many more paragraphs of text, but I don’t see it correcting the weird definition of “herd immunity” that it establishes at the beginning.
However, the current text as I am seeing it live on 31 December (last update today, apparently) is significantly different; it gives a lot of space to the benefits of vaccination, but does not phrase it in such a way as to ignore other immunity sources the way the 13 November text did, and makes it clearer that the “herd immunity through vaccination” is a normative claim on actions that should be taken and not a positive or nominative claim on what herd immunity actually is. Here’s the current first paragraph, emphasis mine:
‘Herd immunity’, also known as ‘population immunity’, is the indirect protection from an infectious disease that happens when a population is immune either through vaccination or immunity developed through previous infection. WHO supports achieving ‘herd immunity’ through vaccination, not by allowing a disease to spread through any segment of the population, as this would result in unnecessary cases and deaths.
The rest of the new text more or less matches this change from the 13 November version; there is a bit about “The fraction of the population that must be vaccinated against COVID-19 to begin inducing herd immunity is not known”, but that’s several paragraphs in and I read it as pretty well-contextualized to “given that the plan is to vaccinate until we reach that point”. Here’s the first sentence from the third paragraph, emphasis mine:
Vaccines train our immune systems to create proteins that fight disease, known as ‘antibodies’, just as would happen when we are exposed to a disease but – crucially – vaccines work without making us sick.
The part I emphasized in that sentence is actually identical in the 13 November text, but badly contextualized. (The other differences in the third paragraph are immaterial to the distinction under question, consisting only of additional explanatory text—I assume to help readers who don’t have a basic gears-model of immune response and viral transmission readily in memory.)
Importantly, and to restate something from above, the third and all subsequent paragraphs are missing from the right-hand screenshot in the post, and it doesn’t look like normal truncation at a glance—the whitespace at the bottom of the screenshot visually implies that the second paragraph was the end of the entry in that version, which is false.
IA snapshots show that the 13 November text was in place up through 27 December, so perhaps not a small blip in terms of Internet time, but it does seem to have been corrected.
I was not able to verify the 9 June text, since IA shows no snapshots of this URL before October. I imagine perhaps the URL was different, and I would appreciate a hard reference if anyone has one.
There is a deleted comment parent to dxu’s which is not very obvious in the interface due to being represented by a single arrow glyph.
[Epistemic status: experience-based synthesis, likely biased]
Most of these seem reasonably sane, of course with varying levels of cultural and situational slant and specificity (as one would expect from any list like this). One of them, however, strikes me as actively dangerous in a way worth mentioning:
If you want to become funny, try just saying stupid shit until something sticks.
Doing this visibly in more sensitive or conformist social groups can be a disaster. Gaining a reputation for saying erratic things can make you the person that no one can take anywhere because you might ruin the environment at any time, and then you’re in the hole. Depending on your interpersonal goals, it may be that exiting a group like that would be a net benefit for you, but even if that’s true for you, you may want to examine those options first before playing roulette with your status.
Bouncing things off yourself doesn’t have the same problem, but seems like a much weaker way of developing a quality which is fundamentally social; it can work if you have an internal sense of what’s funny but haven’t “found” it for conscious access, but it doesn’t work if you were miscalibrated to start with. Bouncing things off trusted friends can work, but at that point you’re more likely to have already had that option saliently in mind. (Well, if you didn’t and you’re reading this, now you do.)
More specifically, I think people who are socially oblivious and think that humor will improve their standing may be likely to jump at 52, and if they are in the above situation, get hurt, with the hazard having been invisible due to the obliviousness. One might then ask why they would get marginally hurt if they were already likely to make social errors—but I think it’s possible to get by in such cases with (perhaps not consciously noticed) conditioned broad inhibitions instead… until you read something like this as “permission”.
Long before we get to the “LLMs are showing a number of abilities that we don’t really understand the origins of” part (which I think is the most likely here), a number of basic patterns in chess show up in the transcript semi-directly depending on the tokenization. The full set of available board coordinates is also countable and on the small side. Enough games and it would be possible to observe that “. N?3” and “. N?5” can come in sequence but the second one has some prerequisites (I’m using the dot here to point out that there’s adjacent text cues showing which moves are from which side), that if there’s a “0-0” there isn’t going to be a second one in the same position later, that the pawn moves “. ?2” and “. ?1” never show up… and so on. You could get a lot of the way toward inferring piece positions by recognizing the alternating move structure and then just taking the last seen coordinates for a piece type, and a layer of approximate-rule-based discrimination would get you a lot further than that.
That in turn is actually dependent on whether having your ambient thoughts occupied by YouTube is better overall than having them occupied by nothing for a while. There’s a lot of valuable background processing that I suspect gets starved by constant stimulation. Of course, carving out explicit time for reflection or for a meditation practice or similar is also something one can do.
The term “regress” sounds like it means “move down”, but instead it just means “move closer to”.
It means “return to(ward)”, with the implication that the observed difference from the mean is (partially) transient, so you’re returning to a past state. An example of why it sometimes implies “worsen” or “decrease” is that in a developmental context, most of the relevant change over time is assumed to be improvement, so a regression is by default a return to a lesser or worse state. This doesn’t necessarily invalidate what you said about it in a broader way, but that’s how the association comes out in my mind.
[Question] Do you consider your current, non-superhuman self aligned with “humanity” already?
“ForgeModLoader” has an interestingly concrete plausible referent in the loader component of the modding framework Forge for Minecraft. I believe in at least some versions its logfiles are named beginning with that string exactly, but I’m not sure where else that appears exactly (it’s often abbreviated to “FML” instead). “FactoryReloaded” also appears prominently in the whitespace-squashed name (repository and JAR file names in particular) of the mod “MineFactory Reloaded” which is a Forge mod. I wonder if file lists or log files were involved in swinging the distribution of those?
You could say “why would you connect the playful and the serious” and I’d be like “they’re the same person, this is how they think, their character comes across when they play”.
This feels close to a crux to me. Compare: if you were in a theater troupe, and someone preferred to play malicious characters, would you make the same judgment?
So, it’s not a question of “playful” versus “serious” attitudes, but of “bounded by fiction” versus “executed in reality”. The former is allowed to leak into the latter in ways that are firmly on the side of nondestructive, so optional money handouts in themselves don’t result in recoil. But when that unipolar filter is breached, such as when flip-side consequences like increased moderator scrutiny also arrive in reality, not having a clear barrier where you’ve applied the same serious consideration that the real action would receive feels like introducing something adverse under false pretenses. (There is some exception made here for psychological consequences of e.g. satire.)
The modern April Fools’ tradition as I have usually interpreted it implies that otherwise egregious-seeming things done on April Fools’ Day are expected to be primarily fiction, with something like the aforementioned unipolar liminality to them.
Similarly, I think there’s something silly/funny about making good heart tokens and paying for them on April First. And yet, if someone tries to steal them, I will think of that as stealing.
Combining this with the above, I would predict TLW to be much less disturbed by a statement of “for the purpose of Good Heart tokens, we will err on the broad side in terms of non-intrusively detecting exploitative behavior and disallowing monetary redemption of tokens accumulated in such a way, but for all other moderation purposes, the level of scrutiny applied will remain as it was”. That would limit any increase in negative consequences to canceling the positive individual consequences “leaking out of” the experiment.
The other and arguably more important half of things here is that the higher-consequence action has been overlaid onto an existing habitual action in an invasive way. If you were playing a board game, moving resource tokens to your area contrary to the rules of the game might be considered antisocial cheating in the real world. However, if the host suddenly announced that the tokens in the game would be cashed out in currency and that stealing them would be considered equivalent to stealing money from their purse, while the game were ongoing, I would expect some people to get up and leave, even if they weren’t intending to cheat, because the tradeoff parameters around other “noise” risks have suddenly been pulled out from underneath them. This is as distinct from e.g. consciously entering a tournament where you know there will be real-money prizes, and it’s congruent with TLW’s initial question about opting out.
For my part, I’m not particularly worried (edit: on a personal level), but I do find it confusing that I didn’t see an explicit rule for which votes would be part of this experiment and which wouldn’t. My best guess is that it applies when both the execution of the vote and the creation of its target fall within the experiment period; is that right?
Something about this feels off to me. One of the salient possibilities in terms of technology affecting romantic relationships, I think, is hyperspecificity in preferences, which seems like it has a substantial social component to how it evolves. In the case of porn, with (broadly) human artists, the r34 space still takes a substantial delay and cost to translate a hyperspecific impulse into hyperspecific porn, including the cost of either having the skills and taking on the workload mentally (if the impulse-haver is also the artist) or exposing something unusual plus mundane coordination costs plus often commission costs or something (if the impulse-haver is asking a different artist).
With interactively usable, low-latency generative AI, an impulse-haver could not only do a single translation step like that much more easily, but iterate on a preference and essentially drill themselves a tunnel out of compatibility range. No? That seems like the kind of thing that makes an order-of-magnitude difference. Or do natural conformity urges or starting distributions stop that from being a big deal? Or what?
Having written that, I now wonder what circumstances would cause people to drill tunnels toward each other using the same underlying technology, assuming the above model were true…