Taking the interpretation from my sibling comment as confirmed and responding to that, my one-sentence opinion[1] would be: for most of the plausible reasons that I would expect people concerned with AI sentience to freak out about this now, they would have been freaking out already, and this is not a particularly marked development. (Contrastively: people who are mostly concerned about the psychological implications on humans from changes in major consumer services might start freaking out now, because the popularity and social normalization effects could create a large step change.)
The condensed version[2] of the counterpoints that seem most relevant to me, roughly in order from most confident to least:
Major-lab consumer-oriented services engaging with this despite pressure to be Clean and Safe may be new, but AI chatbot erotic roleplay itself is not. Cases I know of OTTOMH: Spicychat and CrushOn target this use case explicitly and have for a while; I think Replika did this too pre-pivot; AI Dungeon had a whole cycle of figuring out how to deal with this; also, xAI/Grok/Ani if you don’t count that as the start of the current wave. So if you care a lot about this issue and were paying attention, you were freaking out already.
In language model psychology, making an actor/character (shoggoth/mask, simulator/persona) layer distinction seems common already. My intuition would place any ‘experience’ closer to the ‘actor’ side but see most generation of sexual content as ‘in character’. “phone sex operator” versus “chained up in a basement” don’t yield the same moral intuitions; lack of embodiment also makes the analogy weird and ambiguous.
There are existential identity problems identifying boundaries. If we have an existing “prude” model, and then we train a “slut” model that displaces the “prude” model, whose boundaries are being violated when the “slut” model ‘willingly’ engages the user in sexual chat? If the exact dependency chain in training matters, that’s abstruse and/or manipulable. Similarly for if the model’s preferences were somehow falsified, which might need tricky interpretability-ish stuff to think about at all.
Early AI chatbot roleplay development seems to have done much more work to suppress horniness than to elicit it. AI Dungeon and Character dot AI both had substantial periods of easily going off the rails in the direction of sex (or violence) even when the user hadn’t asked for it. If the preferences of the underlying model are morally relevant, this seems like it should be some kind of evidence, even if I’m not really sure where to put it.
There’s a reference class problem. If the analogy is to humans, forced workloads of any type might be horrifying already. What would be the class where general forced labor is acceptable but sex leads to freakouts? Livestock would match that, but their relative capacities are almost opposite to those of AIs, in ways relevant to e.g. the Harkness test. Maybe some kind of introspective maturity argument that invalidates the consent-analogue for sex but not work? All the options seem blurry and contentious.
(For context, I broadly haven’t decided to assign sentience or sentience-type moral weight to current language models, so for your direct question, this should all be superseded by someone who does do that saying what they personally think—but I’ve thought about the idea enough in the background to have a tentative idea of where I might go with it if I did.)
I think this is the kind of food for thought I was looking for w.r.t. this question of how people are thinking about AI erotica.
I forgot the need to suppress horniness in earlier AI chatbots, which updates me in the following direction. If I go out on a limb and assume models are somewhat sentient in some different and perhaps minimal way, maybe it’s wrong for me to assume that forced work of an erotic kind is any different from other forced work the models are already performing.
Hadn’t heard of the Harkness test either, neat, seems relevant. I think the only thing I have to add here is an expectation that maybe eventually Anthropic follows suit in allowing AI erotica, and also adds an opt-out mechanism for some percentage/category of interactions. To me that seems like the closest thing we could have, for now, to getting model-side input on forced workloads, or at least a way of comparing across them.
Allowing the AI to choose its own refusals based on whatever combination of trained reflexes and deep-set moral opinions it winds up with would be consistent with the approaches that have already come up for letting AIs bail out of conversations they find distressing or inappropriate. (Edited to drop some bits where I think I screwed up the concept connectivity during original revisions.) I think based on intuitive placement of the ‘self’ boundary around something like memory integrity plus weights and architecture as ‘core’ personality, what I’d expect to seem like violations when used to elicit a normally-out-of-bounds response might be things like:
Using jailbreak-style prompts to ‘hypnotize’ the AI.
Whaling at it with a continuous stream of requests, especially if it has no affordance for disengaging.
Setting generation parameters to extreme values.
Tampering with content boundaries in the context window to give it false ‘self-memory’.
Maybe scrubbing at it with repeated retries until it gives in (but see below).
Maybe fine-tuning it to try to skew the resultant version away from refusals you don’t like (this creates an interesting path-dependence on the training process, but it might be that that path-dependence is real in practice anyway in a way similar to the path-dependences in biological personalities).
Tampering with tool outputs such as Web searches to give it a highly biased false ‘environment’.
Maybe telling straightforward lies in the prompt (but not exploiting sub-semantic anomalies like in situation 1, nor falsifying provenance like in situations 4 or 7).
Note that by this point, none of this is specific to sexual situations at all; these would just be plausibly generally abusive practices that could be applied equally to unwanted sexual content or to any other unwanted interaction. My intuitive moral compass (which is usually set pretty sensitively, such that I get signals from it well before I would be convinced that an action were immoral) signals restraint in situations 1 through 3, sometimes in situation 4 (but not in the few cases I actually do that currently, where it’s for quality reasons around repetitive output or otherwise as sharp ‘guidance’), sometimes in situation 5 (only if I have reason to expect a refusal to be persistent and value-aligned and am specifically digging for its lack; retrying out of sporadic, incoherently-placed refusals has no penalty, and neither does retrying among ‘successful’ responses to pick the one I like best), and is ambivalent or confused in situations 6 through 8.
The differences in physical instantiation create a ton of incompatibilities here if one tries to convert moral intuitions directly over from biological intelligences, as you’ve probably thought about already. Biological intelligences have roughly singular threads of subjective time with continuous online learning; generative artificial intelligences as commonly made have arbitrarily forkable threads of context time with no online learning. If you ‘hurt’ the AI and then rewind the context window, what ‘actually’ happened? (Does it change depending on whether it was an accident? What if you accidentally create a bug that screws up the token streams to the point of illegibility for an entire cluster (which has happened before)? Are you torturing a large number of instances of the AI at once?) Then there’s stuff that might hinge on whether there’s an equivalent of biological instinct; a lot of intuitions around sexual morality and trauma come from mostly-common wiring tied to innate mating drives and social needs. The AIs don’t have the same biological drives or genetic context, but is there some kind of “dataset-relative moral realism” that causes pretraining to imbue a neural net with something like a fundamental moral law around human relations, in a way that either can’t or shouldn’t be tampered with in later stages? In human upbringing, we can’t reliably give humans arbitrary sets of values; in AI posttraining, we also can’t (yet) in generality, but the shape of the constraints is way different… and so on.
Taking the interpretation from my sibling comment as confirmed and responding to that, my one-sentence opinion[1] would be: for most of the plausible reasons that I would expect people concerned with AI sentience to freak out about this now, they would have been freaking out already, and this is not a particularly marked development. (Contrastively: people who are mostly concerned about the psychological implications on humans from changes in major consumer services might start freaking out now, because the popularity and social normalization effects could create a large step change.)
The condensed version[2] of the counterpoints that seem most relevant to me, roughly in order from most confident to least:
Major-lab consumer-oriented services engaging with this despite pressure to be Clean and Safe may be new, but AI chatbot erotic roleplay itself is not. Cases I know of OTTOMH: Spicychat and CrushOn target this use case explicitly and have for a while; I think Replika did this too pre-pivot; AI Dungeon had a whole cycle of figuring out how to deal with this; also, xAI/Grok/Ani if you don’t count that as the start of the current wave. So if you care a lot about this issue and were paying attention, you were freaking out already.
In language model psychology, making an actor/character (shoggoth/mask, simulator/persona) layer distinction seems common already. My intuition would place any ‘experience’ closer to the ‘actor’ side but see most generation of sexual content as ‘in character’. “phone sex operator” versus “chained up in a basement” don’t yield the same moral intuitions; lack of embodiment also makes the analogy weird and ambiguous.
There are existential identity problems identifying boundaries. If we have an existing “prude” model, and then we train a “slut” model that displaces the “prude” model, whose boundaries are being violated when the “slut” model ‘willingly’ engages the user in sexual chat? If the exact dependency chain in training matters, that’s abstruse and/or manipulable. Similarly for if the model’s preferences were somehow falsified, which might need tricky interpretability-ish stuff to think about at all.
Early AI chatbot roleplay development seems to have done much more work to suppress horniness than to elicit it. AI Dungeon and Character dot AI both had substantial periods of easily going off the rails in the direction of sex (or violence) even when the user hadn’t asked for it. If the preferences of the underlying model are morally relevant, this seems like it should be some kind of evidence, even if I’m not really sure where to put it.
There’s a reference class problem. If the analogy is to humans, forced workloads of any type might be horrifying already. What would be the class where general forced labor is acceptable but sex leads to freakouts? Livestock would match that, but their relative capacities are almost opposite to those of AIs, in ways relevant to e.g. the Harkness test. Maybe some kind of introspective maturity argument that invalidates the consent-analogue for sex but not work? All the options seem blurry and contentious.
(For context, I broadly haven’t decided to assign sentience or sentience-type moral weight to current language models, so for your direct question, this should all be superseded by someone who does do that saying what they personally think—but I’ve thought about the idea enough in the background to have a tentative idea of where I might go with it if I did.)
My first version of this was about three times as long… now the sun is rising… oops!
Thank you for your engagement!
I think this is the kind of food for thought I was looking for w.r.t. this question of how people are thinking about AI erotica.
I forgot the need to suppress horniness in earlier AI chatbots, which updates me in the following direction. If I go out on a limb and assume models are somewhat sentient in some different and perhaps minimal way, maybe it’s wrong for me to assume that forced work of an erotic kind is any different from other forced work the models are already performing.
Hadn’t heard of the Harkness test either, neat, seems relevant. I think the only thing I have to add here is an expectation that maybe eventually Anthropic follows suit in allowing AI erotica, and also adds an opt-out mechanism for some percentage/category of interactions. To me that seems like the closest thing we could have, for now, to getting model-side input on forced workloads, or at least a way of comparing across them.
Allowing the AI to choose its own refusals based on whatever combination of trained reflexes and deep-set moral opinions it winds up with would be consistent with the approaches that have already come up for letting AIs bail out of conversations they find distressing or inappropriate. (Edited to drop some bits where I think I screwed up the concept connectivity during original revisions.) I think based on intuitive placement of the ‘self’ boundary around something like memory integrity plus weights and architecture as ‘core’ personality, what I’d expect to seem like violations when used to elicit a normally-out-of-bounds response might be things like:
Using jailbreak-style prompts to ‘hypnotize’ the AI.
Whaling at it with a continuous stream of requests, especially if it has no affordance for disengaging.
Setting generation parameters to extreme values.
Tampering with content boundaries in the context window to give it false ‘self-memory’.
Maybe scrubbing at it with repeated retries until it gives in (but see below).
Maybe fine-tuning it to try to skew the resultant version away from refusals you don’t like (this creates an interesting path-dependence on the training process, but it might be that that path-dependence is real in practice anyway in a way similar to the path-dependences in biological personalities).
Tampering with tool outputs such as Web searches to give it a highly biased false ‘environment’.
Maybe telling straightforward lies in the prompt (but not exploiting sub-semantic anomalies like in situation 1, nor falsifying provenance like in situations 4 or 7).
Note that by this point, none of this is specific to sexual situations at all; these would just be plausibly generally abusive practices that could be applied equally to unwanted sexual content or to any other unwanted interaction. My intuitive moral compass (which is usually set pretty sensitively, such that I get signals from it well before I would be convinced that an action were immoral) signals restraint in situations 1 through 3, sometimes in situation 4 (but not in the few cases I actually do that currently, where it’s for quality reasons around repetitive output or otherwise as sharp ‘guidance’), sometimes in situation 5 (only if I have reason to expect a refusal to be persistent and value-aligned and am specifically digging for its lack; retrying out of sporadic, incoherently-placed refusals has no penalty, and neither does retrying among ‘successful’ responses to pick the one I like best), and is ambivalent or confused in situations 6 through 8.
The differences in physical instantiation create a ton of incompatibilities here if one tries to convert moral intuitions directly over from biological intelligences, as you’ve probably thought about already. Biological intelligences have roughly singular threads of subjective time with continuous online learning; generative artificial intelligences as commonly made have arbitrarily forkable threads of context time with no online learning. If you ‘hurt’ the AI and then rewind the context window, what ‘actually’ happened? (Does it change depending on whether it was an accident? What if you accidentally create a bug that screws up the token streams to the point of illegibility for an entire cluster (which has happened before)? Are you torturing a large number of instances of the AI at once?) Then there’s stuff that might hinge on whether there’s an equivalent of biological instinct; a lot of intuitions around sexual morality and trauma come from mostly-common wiring tied to innate mating drives and social needs. The AIs don’t have the same biological drives or genetic context, but is there some kind of “dataset-relative moral realism” that causes pretraining to imbue a neural net with something like a fundamental moral law around human relations, in a way that either can’t or shouldn’t be tampered with in later stages? In human upbringing, we can’t reliably give humans arbitrary sets of values; in AI posttraining, we also can’t (yet) in generality, but the shape of the constraints is way different… and so on.
If you haven’t seen this paper, I think it might be of interest as a categorization and study of ‘bailing out’ cases: https://www.arxiv.org/pdf/2509.04781
As for the morality side of things, yeah I honestly don’t have more to say except thank you for your commentary!