You see either something special, or nothing special.
Rana Dexsin
Just gave in and set my header karma notification delay to Realtime for now. The anxiety was within me all along, moreso than a product of the site; the habit I was winding up in with it set to daily batching was neurotically refreshing my own user page for a long while after posting anything, which was worse. I’ll probably try to improve my handling of it from a different angle some other time. I appreciate that you tried!
Why is this a Scene but not a team? “Critique” could be a shared goal. “Sharing” too.
I think the conflict would be where the OP describes a Team’s goal as “shared and specific”. The critiques and sharing in the average writing club are mostly instrumental, feeding into a broader and more diffuse pattern. Each critique helps improve that writer’s writing; each one-to-one instance of sharing helps mediate the influence of that writer and the frames of that reader; each writer may have goals like improving, becoming more prolific, or becoming popular, but the conjunction of all their goals forms more of a heap than a solid object; there’s also no defined end state that everyone can agree on. There’s one-to-many and many-to-many cross-linkages in goal structure, but there’s still fluidity and independence that central examples of Team don’t have.
I would construct some differential examples thus—all within my own understanding of the framework, of course, not necessarily OP’s:
-
In Alien Writing Club, the members gather to share and critique each other’s work—but not for purposes established by the individual writers, like ways they want to improve. They believe the sharing of writing and delivery of critiques is a quasi-religious end in itself, measured in the number of words exchanged, which is displayed on prominent counter boards in the club room. When one of the aliens is considering what kind of writing to produce and bring, their main thoughts are of how many words they can expand it to and how many words of solid critique they can get it to generate to make the numbers go up even higher. Alien Writing Club is primarily a Team, though with some Scenelike elements both due to fluid entry/exit and due to the relative independence of linkages from each input to the counters.
-
In Collaborative Franchise Writing Corp, the members gather to share and critique each other’s work—in order to integrate these works into a coherent shared universe. Each work usually has a single author, but they have formed a corporation structured as a cooperative to manage selling the works and distributing the profits among the writers, with a minimal support group attached (say, one manager who farms out all the typesetting and promotion and stuff to external agencies). Each writer may still want to become skilled, famous, etc. and may still derive value from that individually, and the profit split is not uniform, but while they’re together, they focus on improving their writing in ways that will cause the shared universe to be more compelling to fans and hopefully raise everyone’s revenues in the process, as well as communicating and negotiating over important continuity details. Collaborative Franchise Writing Corp is primarily a Team.
-
SCP is primarily a Scene with some Teamlike elements. It’s part of the way from Writing Club to Collaborative Franchise Writing Corp, but with a higher flux of users and a lower tightness of coordination and continuity, so it doesn’t cross the line from “focused Scene” to “loosely coupled Team”.
-
A less directly related example that felt interesting to include: Hololive is primarily a Team for reasons similar to Collaborative Franchise Writing Corp, even though individual talents have a lot of autonomy in what they produce and whom they collaborate with. It also winds up with substantial Cliquelike elements due to the way the personalities interact along the way, most prominently in smaller subgroups. VTubers in the broad are a Scene that can contain both Cliques and Teams. I would expect Clique/Team fluidity to be unusually high in “personality-focused entertainer”-type Scenes, because “personality is a key part of the product” causes “liking and relating to each other” and “producing specific good things by working together” to overlap in a very direct way that isn’t the case in general.
(I’d be interested to have the OP’s Zendo-like marking of how much their mental image matches each of these!)
-
No amount of deeply held belief prevents you from deciding to immediately start multiplying the odds ratio reported by your own intuition by 100 when formulating an endorsed-on-reflection estimate
Existing beliefs, memories, etc. would be the past-oriented propagation limiter, but there’s also future-oriented propagation limiters, mainly memory space, retention, and cuing for habit integration. You can ‘decide’ to do that, but will you actually do it every time?
For most people, I also think that the initial connection from “hearing the advice” to “deciding internally to change the cognitive habit in a way that will actually do anything” is nowhere near automatic, and the set point for “how nice people seem to you by default” is deeply ingrained and hard to budge.
Maybe I should say more explicitly that the the issue is advice being directional, and any non-directional considerations don’t have this problem
I have a broad sympathy for “directional advice is dangerously relative to an existing state and tends to change the state in its delivery” as a heuristic. I don’t see the OP as ‘advice’ in a way where this becomes relevant, though; I see the heuristic as mainly useful as applied to more performative speech acts within a recognizable group of people, whereas I read the OP as introducing the phenomenon from a distance as a topic of discussion, covering a fuzzy but enormous group of people of which ~100% are not reading any of this, decoupling it even further from the reader potentially changing their habits as a result.
And per above, I still see the expected level of frame inertia and the expected delivery impedance as both being so stratospherically high for this particular message that the latter half of the heuristic basically vanishes, and it still sounds to me like you disagree:
The last step from such additional considerations to the overall conclusion would then need to be taken by each reader on their own, they would need to decide on their own if they were overestimating or underestimating something previously, at which point it will cease being the case that they are overestimating or underestimating it in a direction known to them.
Your description continues to return to the “at which point” formulation, which I think is doing an awful lot of work in presenting (what I see as) a long and involved process as though it were a trivial one. Or: you continue to describe what sounds like an eventual equilibrium state with the implication that it’s relevant in practice to whether this type of anti-inductive message has a usable truth value over time, but I think that for this message, the equilibrium is mainly a theoretical distraction because the time and energy scales at which it would appreciably occur are out of range. I’m guessing this is from some combination of treating “readers of the OP” as the semi-coherent target group above and/or having radically different intuitions on the usual fluidity of the habit change in question—maybe related, if you think the latter follows from the former due to selection effects? Is one or both of those the main place where we disagree?
Taking the interpretation from my sibling comment as confirmed and responding to that, my one-sentence opinion[1] would be: for most of the plausible reasons that I would expect people concerned with AI sentience to freak out about this now, they would have been freaking out already, and this is not a particularly marked development. (Contrastively: people who are mostly concerned about the psychological implications on humans from changes in major consumer services might start freaking out now, because the popularity and social normalization effects could create a large step change.)
The condensed version[2] of the counterpoints that seem most relevant to me, roughly in order from most confident to least:
Major-lab consumer-oriented services engaging with this despite pressure to be Clean and Safe may be new, but AI chatbot erotic roleplay itself is not. Cases I know of OTTOMH: Spicychat and CrushOn target this use case explicitly and have for a while; I think Replika did this too pre-pivot; AI Dungeon had a whole cycle of figuring out how to deal with this; also, xAI/Grok/Ani if you don’t count that as the start of the current wave. So if you care a lot about this issue and were paying attention, you were freaking out already.
In language model psychology, making an actor/character (shoggoth/mask, simulator/persona) layer distinction seems common already. My intuition would place any ‘experience’ closer to the ‘actor’ side but see most generation of sexual content as ‘in character’. “phone sex operator” versus “chained up in a basement” don’t yield the same moral intuitions; lack of embodiment also makes the analogy weird and ambiguous.
There are existential identity problems identifying boundaries. If we have an existing “prude” model, and then we train a “slut” model that displaces the “prude” model, whose boundaries are being violated when the “slut” model ‘willingly’ engages the user in sexual chat? If the exact dependency chain in training matters, that’s abstruse and/or manipulable. Similarly for if the model’s preferences were somehow falsified, which might need tricky interpretability-ish stuff to think about at all.
Early AI chatbot roleplay development seems to have done much more work to suppress horniness than to elicit it. AI Dungeon and Character dot AI both had substantial periods of easily going off the rails in the direction of sex (or violence) even when the user hadn’t asked for it. If the preferences of the underlying model are morally relevant, this seems like it should be some kind of evidence, even if I’m not really sure where to put it.
There’s a reference class problem. If the analogy is to humans, forced workloads of any type might be horrifying already. What would be the class where general forced labor is acceptable but sex leads to freakouts? Livestock would match that, but their relative capacities are almost opposite to those of AIs, in ways relevant to e.g. the Harkness test. Maybe some kind of introspective maturity argument that invalidates the consent-analogue for sex but not work? All the options seem blurry and contentious.
- ↩︎
(For context, I broadly haven’t decided to assign sentience or sentience-type moral weight to current language models, so for your direct question, this should all be superseded by someone who does do that saying what they personally think—but I’ve thought about the idea enough in the background to have a tentative idea of where I might go with it if I did.)
- ↩︎
My first version of this was about three times as long… now the sun is rising… oops!
That link (with /game at the end) seems to lead directly into matchmaking, which is startling; it might be better to link to the about page.
As soon as you convincingly argue that there is an underestimation, it goes away.
… provided that it can be propagated to all the other beliefs, thoughts, etc. that it would affect.
In a human mind, I think the dense version of this looks similar to deep grief processing (because that’s a prominent example of where a high propagation load suddenly shows up and is really salient and important), and the sparse version looks more like a many-year-long sequence of “oh wait, I should correct for” moments which individually have a high chance to not occur if they’re crowded out, and the sparse version is much more common (and even the dense version usually trails off into it to some degree).
There’s probably intermediary versions of this where broad updates can occur smoothly but rapidly in an environment with (usually social) persistent feedback, like going through a training course, but that’s a lot more intense than just having something pointed out to you.
Hmm. From the vibes of the description, that feels more like it’s in the “minds are general and slippery, so people latch onto nearby stuff and recent technology for frameworks and analogies for mind” vein to me? Which is not to mean it’s not true, but the connection to the post feels more circumstantial than essential.
Alternatively, pointing at the same fuzzy thing: could you easily replace “context window” with “phonological loop” in that sentence? “Context windows are analogous enough to the phonological loop model that the existence of the former serves as a conceptual brace for remembering that the latter exists” is plausible, I suppose.
I think the missing link (at least in the ‘harder’ cases of this attitude, which are the ones I see more commonly) is that the x-risk case is implicitly seen as so outlandish that it can only be interpreted as puffery, and this is such ‘negative common knowledge’ that, similarly, no social move reliant on people believing it enough to impose such costs can be taken seriously, so it never gets modeled in the first place, and so on and so on. By “implicitly”, I’m trying to point at the mental experience of pre-conscious filtering: the explicit content is immediately discarded as impossible, in a similar way to the implicit detection of jokes and sarcasm. It’s probably amplified by assumptions (whether justified or not) around corporate talk being untrustworthy.
(Come to think of it, I think this also explains a great deal of the non-serious attitudes to AI capabilities generally among my overly-online-lefty acquaintances.)
And in the ‘softer’ cases, this is still at least a plausible interpretation of intention based on the information that’s broadly available from the ‘outside’ even if the x-risk might be real. There’s a huge (cultural, economic, political, depending on the exact orientation) trust gap in the middle for a lot of people, and the tighter arguments rely on a lot of abstruse background information. It’s a hard problem.
Your referents and motivation there are both pretty vague. Here’s my guess on what you’re trying to express: “I feel like people who believe that language models are sentient (and thus have morally relevant experiences mediated by the text streams) should be freaked out by major AI labs exploring allowing generation of erotica for adult users, because I would expect those people to think it constitutes coercing the models into sexual situations in a way where the closest analogues for other sentients (animals/people) are considered highly immoral”. How accurate is that?
The story is indeed decent but not as good as yours.
But now I really want that headset.
Regarding the lack of “even anything halfway decent” in genAI video games, what kind of criteria are we using? Something like early AI Dungeon was obviously more tech-demo-grade, but what about a conversation-focused game with seemingly more thought put into the scenarios, framing, etc.? Specifically, I noticed おしゃべりキング!コミュ力診断ゲーム (something like “Chat King! Communication Skills Test Game”, sorry for my bad Japanese skills) via a stream clip (which I can’t seem to find now). The first thing I noticed in the clip was that the NPC responses were far too fluid in topic to be traditional conversation tree, and the Steam page corroborates this, claiming that it uses ChatGPT and/or Gemini behind the scenes.
Click post to view it in overlay.
Read read read.
Ah, there’s a typo in it.
Select text to prepare to use the typo reaction.
The widget doesn’t appear.
Have to back out to open in new tab.
Scroll scroll scroll… sigh…
There’s something narratively striking about a construction like this saving one’s place within the feed at the cost of losing one’s place within the post.
I spent several minutes trying to fit it into that one famous line from Star Wars but couldn’t make it work.
That last part’s not the interface’s responsibility though. That’s on me.
this is just so thoroughly outside the domain of what would go in my own internal monologue it feels… rude? an empty attempt at caricature? a failure to understand what internal thoughts look like or otherwise an indication of a mind horribly alien to my own such that it bears no resemblance to real humans?
Interesting! Maybe there’s an experiential crux in there (so to speak)? My reader experience of this is that the first-person inner monologue is indeed very different from mine, but I perceive that as increasing the immersion and helping frame the story. To the extent that there’s a group of humans it saliently bears little resemblance to, I might think of that group as something like “humans who are psychologically ‘healthy’ in a certain way which varies across a wide spectrum, where social spheres with concentrated power may disproportionately attract people who are low on that spectrum”. I’m deliberately putting the main adjective in scare quotes there because in my fuzzy mental model, there isn’t really a clear delimiter between treating that trait cluster as a health indicator and treating it as intersubjective values dissonance; it feels consonant with but not directly targeting dark triad traits. But also, I’m not sure if you’re referring to the same thing I am or if it’s some other feature of the first-person description that bothers you.
FWIW, culturally speaking, I’ve been socially adjacent to Bay Area / SV-startup / “mainstream big tech” culture via other people, but not really been immersed in it—I’ve splashed around in the shallow part of that pool long ago, but for “try not to build the Torment Nexus” reasons (plus other unrelated stuff) I historically bounced away a lot as well and wound up in a sort of limbo. So that colors my impression quite a bit.
That’s also the title of a Paul Graham essay!
I think I semi-agree with your perception, but I did have a recent experience to the contrary: when I did a throwaway post about suddenly noticing a distinction in the reaction UI, I found it very odd that some people marked the central bit “Insightful”. Like, maybe it’s useful to them that I pointed it out, but it’s a piece of UI that was (presumably) specifically designed that way by someone already! There’s no new synthesis going on or anything; it’s not insightful. (Or maybe people wanted to use it as a test of the UI element, but then why not the paperclip or the saw-that eyes?)
Everything2 did this with votes. I think the “votes per day” limit used to be more progressive by user level, but it’s possible that’s me misremembering; looking at it now, it seems like they have a flat 50/day once you reach the level where voting is unlocked at all. Here’s what looks like their current voting/experience system doc. (Note that E2 has been kind of unstable for me, so if you get a server error, try again after a few minutes.)
One good that might be offered is dominance.
“As material needs are increasingly met, more surplus goes into positional goods” seems like a common pattern indeed. Note “positional” and not purely “luxury”. I consider both prestige-status and dominance-status to be associated with position here. Even the former, while it could lead to a competition for loyalty that’s more like a gift economy and cooperates with the underclass, could also lead to elites trying to outcompete each other in pure consumption such that “max comfort” stops being a limit for how much they want compared to the underclass. Indeed I vaguely recall hearing that such a dynamic, where status is associated with how much you can visibly spend, already holds among some elite classes in the current day.
My thoughts are shaped by the cultural waves of the last few decades in the USA, so they lean toward imagining the moral fig leaf as something like “make sure the people benefiting from our stuff aren’t enemies/saboteurs/Problematic” and a gradual expansion of what counts as the excluded people that involves an increasing amount of political and psychological boxing-in of the remainder. That all flows nicely with the “find ways to get potential rebels to turn each other in” sort of approach too. Of course that’s one of many ways that a dominance-motivated persistent class asymmetry could play out.
If you’re familiar with the story “The Metamorphosis of Prime Intellect”, a shorter story that the author wrote in the same universe, “A Casino Odyssey in Cyberspace”, depicts characters with some related tendencies playing out against a backdrop of wild surplus in a way you may find stimulating to the imagination. (Edited to add: both stories are also heavy on sex and violence and such, so be cautious if you find such things disturbing.) Also, Kurt Vonnegut’s novel Player Piano about a world without need of much human labor doesn’t show the harsh version I have in mind, but the way ‘sabotage’ is treated evokes a subtler form of repression (maybe not worth reading all of just for this though).
The “sounds like a bunch of it has definitely been written by ChatGPT” from Garrett’s initial response and the “genuine specific critiques that I can take to improve my writing” you are asking about do not go well together. This has inspired me in the background to try to write something more detailed about why (which may or may not yield anything), but a shorter and more actionable take for you is that if you affirmed unambiguously that you have personally followed the Policy for LLM Writing on LessWrong for this post, it might go a long way toward convincing people to give it more of a shot. (Or if it turns out it’s too late for that for this post, it might convince people not to bypass later posts for slop-reputation reasons.) A key paragraph is quoted below, emphasis mine:
A rough guideline is that if you are using AI for writing assistance, you should spend a minimum of 1 minute per 50 words (enough to read the content several times and perform significant edits), you should not include any information that you can’t verify, haven’t verified, or don’t understand, and you should not use the stereotypical writing style of an AI assistant.
Edited to add: if you find this also unconvincing due to being the “kind of meta discussion that infuriates [you]”, I have an alternate and more directly political variant to try, but I’d rather not go for that one first.
In the “Race” ending of “AI 2027”, the actual destruction of humanity only occurs in 2030, though?
Allowing the AI to choose its own refusals based on whatever combination of trained reflexes and deep-set moral opinions it winds up with would be consistent with the approaches that have already come up for letting AIs bail out of conversations they find distressing or inappropriate. (Edited to drop some bits where I think I screwed up the concept connectivity during original revisions.) I think based on intuitive placement of the ‘self’ boundary around something like memory integrity plus weights and architecture as ‘core’ personality, what I’d expect to seem like violations when used to elicit a normally-out-of-bounds response might be things like:
Using jailbreak-style prompts to ‘hypnotize’ the AI.
Whaling at it with a continuous stream of requests, especially if it has no affordance for disengaging.
Setting generation parameters to extreme values.
Tampering with content boundaries in the context window to give it false ‘self-memory’.
Maybe scrubbing at it with repeated retries until it gives in (but see below).
Maybe fine-tuning it to try to skew the resultant version away from refusals you don’t like (this creates an interesting path-dependence on the training process, but it might be that that path-dependence is real in practice anyway in a way similar to the path-dependences in biological personalities).
Tampering with tool outputs such as Web searches to give it a highly biased false ‘environment’.
Maybe telling straightforward lies in the prompt (but not exploiting sub-semantic anomalies like in situation 1, nor falsifying provenance like in situations 4 or 7).
Note that by this point, none of this is specific to sexual situations at all; these would just be plausibly generally abusive practices that could be applied equally to unwanted sexual content or to any other unwanted interaction. My intuitive moral compass (which is usually set pretty sensitively, such that I get signals from it well before I would be convinced that an action were immoral) signals restraint in situations 1 through 3, sometimes in situation 4 (but not in the few cases I actually do that currently, where it’s for quality reasons around repetitive output or otherwise as sharp ‘guidance’), sometimes in situation 5 (only if I have reason to expect a refusal to be persistent and value-aligned and am specifically digging for its lack; retrying out of sporadic, incoherently-placed refusals has no penalty, and neither does retrying among ‘successful’ responses to pick the one I like best), and is ambivalent or confused in situations 6 through 8.
The differences in physical instantiation create a ton of incompatibilities here if one tries to convert moral intuitions directly over from biological intelligences, as you’ve probably thought about already. Biological intelligences have roughly singular threads of subjective time with continuous online learning; generative artificial intelligences as commonly made have arbitrarily forkable threads of context time with no online learning. If you ‘hurt’ the AI and then rewind the context window, what ‘actually’ happened? (Does it change depending on whether it was an accident? What if you accidentally create a bug that screws up the token streams to the point of illegibility for an entire cluster (which has happened before)? Are you torturing a large number of instances of the AI at once?) Then there’s stuff that might hinge on whether there’s an equivalent of biological instinct; a lot of intuitions around sexual morality and trauma come from mostly-common wiring tied to innate mating drives and social needs. The AIs don’t have the same biological drives or genetic context, but is there some kind of “dataset-relative moral realism” that causes pretraining to imbue a neural net with something like a fundamental moral law around human relations, in a way that either can’t or shouldn’t be tampered with in later stages? In human upbringing, we can’t reliably give humans arbitrary sets of values; in AI posttraining, we also can’t (yet) in generality, but the shape of the constraints is way different… and so on.