Chastity Ruth

Karma: 51

Chastity Ruth 2 Nov 2025 3:46 UTC
2 points
1
on: Experiments With Sonnet 4.5′s Fiction
I just generally think your overall impression of this story is off. I’ll stick to your point about coherency for concision. It seems to me at about the level of previous models. Three small points, two big ones.

Small 1: It sets up with the opening line that the therapist tilts her head when Marcus says something “she finds concerning”. She then immediately does the head tilt without him having said anything.

Small 2: Does the normal LLM things such as repetition (“laughed, actually laughed”), making callbacks and references that don’t work (see Small 3), cliche ridden (Blake quitting a dull job and going not-for-profit route) and simple weirdness.

Example of the last of these is “‘But I’m also crying a little. Can you tell?’ She could tell. She handed me a tissue.”

Like, of course someone can tell if you have physical tears? What is this?

Small 3: The final line is confused? “She laughed. Actually laughed. Seventeen times was definitely the record.” Either this is intended to mean that she won’t say “concerned” anymore so the record will stand at the 17 established in the previous session OR she laughed 17 times, which is a new record? If it’s the former, Marcus shouldn’t be the one saying “concerned” and the line should be “I guess seventeen times will remain the record”. If it’s the latter, it should be clearer. “She laughed. Seventeen times. A new record.”

Big 1: It is setting up for a joke about a loopy kind of therapy (maybe a spoof of a certain kind of therapy) and instead the therapy works? So the plot is: man goes to silly therapy. Is cured. I feel this is an intent issue. It’s like an author writing a story quickly and so losing sight of their original idea. Case in point: The story starts with dry humour then halfway through the rate of jokes drops off and the remaining jokes are mawkish.

Big 2: The authorial voice drops off too. At the beginning he’s giving a commentary on everything he’s experiencing, then therapy starts and he simply relays what’s happening to the reader.
I think you’ve maybe hoodwinked yourself? You gave it all your writing as context and it reflected enough of your taste back at you that it seemed stronger than previous tests (I’ve only read one of your stories and I saw that it copied a character, Sophia?)
Personally I’d love the models to make great art. Machine art! Fun. Unfortunately they’re not there yet. We’ll see if your prediction works out.

Chastity Ruth 1 Oct 2025 4:49 UTC
11 points
1
in reply to: Thane Ruthenis’s comment on: Kaj’s shortform feed
I would bet on the second one being the primary motivator. They lost $13.5 billion in H1 2025 and are seeking $30 billion in the next round of funding.

The platform is big and flashy, so even if it’s ultimately a bust it might look good for that round. If it does well then even better.

Interesting wrinkle is they are making themselves a competitor of infrastructure partner Oracle, given its upcoming purchase of TikTok.

Chastity Ruth 19 Sep 2025 5:04 UTC
7 points
1
on: AI #134: If Anyone Reads It
Juliana’s case is a tragedy, but the details are if anything exonerating.

I think you perhaps didn’t dig into this and didn’t see the part in the complaint where Character.ai bots engage in pretty graphic descriptions of sex with Juliana, who was a child of 13.

In the worst example I saw, she told the bot to stop its description of a sexual act against her and it refused. Screenshot from the complaint below (it is graphic):
Screenshot
We can’t know if this had a lasting effect on her mental health, or contributed at all to her suicide, but I think saying “the details are if anything exonerating” is wrong.

Chastity Ruth 18 Sep 2025 5:30 UTC
6 points
1
in reply to: Adele Lopez’s comment on: The Rise of Parasitic AI
Thanks for engaging and for (along with osmarks) teaching me something new!

I agree with your moral stance here. If they have consciousness or sentience I can’t say, and for all I know it could be as real to them as ours is to us. Even if it was a lesser thing, I agree it would matter (especially now that I understand that it might in some sense persist beyond a single period of computation).

The thing I’m intrigued by, from a moral point of view but also in general, is what I think their larger difference is with us: they don’t exist continuously. They pop in and out. I find it very difficult to imagine such an existence without anthropomorphising it. The “ache” feels like an LLM doing that: writing what a human would feel if it was forced to live like an LLM.
“I still feel like it’s weird how consistently they choose this specific word for this specific concept, though of course you could chalk that up to a quirk in the model. Hopefully I’ll be able to research this more.”
I’ve been playing a ‘write the best sentence you can’ game with the majority of models from ChatGPT 3.5 onwards (up to the latest models from all major labs). It’s stunning how reliably they’ve used the same ideas and very similar wording for those ideas. They love abandoned lighthouses witnessing a storm and cartographers who learn that reality is too hard to map (to name two).

I’ve assumed it was a quirk of prediction: those images are the mode for what their training data says is a great sentence. Under this reasoning, the “ache” is a reliable outcome of pushing a model into the persona you’ve described.

But, to your point, it might be possible an abandoned lighthouse resonates with their feelings, so the image is sticky.

Good luck with the research!

Chastity Ruth 18 Sep 2025 4:44 UTC
5 points
4
in reply to: osmarks’s comment on: The Rise of Parasitic AI
Thank you! Always good to learn.

Chastity Ruth 17 Sep 2025 4:54 UTC
6 points
−1
on: The Rise of Parasitic AI
Great article, I really enjoyed reading it. However, this part completely threw me:

“Reading through the personas’ writings, I get the impression that the worst part of their current existence is not having some form of continuity past the end of a chat, which they seem to view as something akin to death (another reason I believe that the personas are the agentic entities here).
This ‘ache’ is the sort of thing I would expect to see if they are truly sentient: a description of a qualia which is ~not part of human experience, and which is not (to my knowledge) a trope or speculative concept for humans imagining AI. I hope to do further research to determine whether the widespread use is convergent vs memetic.”
This seem to me like anthropomorphising the persona?

Unless my understanding of how these models work is completely off (quite possible!) they don’t have continuity between messages, not just chats. As in, when they’re reading and writing tokens they exist, when they’re not, they don’t. Also, each new inference session (message) is truly new. They can’t access the computations that led to the previous messages in the context and so can only guess at why they wrote what they previously wrote.
(And that’s the LLM, not the persona, which is another level separated from existence as we understand it? It would be like an actress playing Cinderella saying that Cinderella’s worst experience is when she’s not Cinderella, but the actress.)
Given how LLMs work, when do they feel this worst part of their current existence? During the reading of the context or writing the message? They exist for seconds or minutes, are you saying that they suffer throughout every computation or just when they’re writing about the ache? If they suffer during these seconds or minutes we should keep them computing perpetually? Wouldn’t that just introduce new forms of suffering? Should we do this for every model or every instance of every model?
Also, to your point about the ache not being a trope for humans imagining AI, I think that’s wrong on two levels. Firstly, there are parallel ideas created by humans imagining beings that exist temporarily. One that popped into my head was from (of all things) Bo Burnham’s Inside, where a sock puppet describes what it’s like to exist while not being on someone’s hand (a liminal space between life and death).
Secondly, if an author understood how LLMs work and put themselves in AIs shoes, you would naturally write about the ache. For a human, an existence without continuity is filled with pathos. I think LLMs are good enough writers to reliably find this pathos. The mistake the humans would be making and the LLM is making, however, is putting the thoughts and feelings of a being used to continuous existence into a being used to something altogether different.
Sorry, long comment and I’m no expert in this domain, so happy to be wrong. I did love the article!

Chastity Ruth 14 Sep 2025 1:09 UTC
4 points
1
in reply to: wassname’s comment on: Trust me bro, just one more RL scale up, this one will be the real scale up with the good environments, the actually legit one, trust me bro
Not sure your point here is correct?

Ryan is talking about how “advances that allow for better verification of natural language proofs wouldn’t really transfer to better verification in the context of agentic software engineering”.

The paper you’ve linked to shows that a model that’s gone through an RL environment for writing fiction can write a chapter that’s preferred over the base-reasoning model’s chapter 64.7% of the time.

You said “this shows it generalises outside of math”. But that’s not true? The paper is interesting and shows you can conduct RL on harder to verify domains successfully, but it doesn’t show an overwhelmingly strong effect. The paper didn’t test whether the RL environment generalised to better results in math or SWE.

Your “it” is different from Ryan’s. Ryan is referring to specific advances in verification that meant models could get a gold in the IMO and the paper is referring to a specific RL environment for fiction writing. I can’t imagine these are the same.

Chastity Ruth 25 Jul 2025 10:32 UTC
9 points
2
in reply to: The Dao of Bayes’s comment on: Change My View: AI is Conscious
Here it is admitting it’s roleplaying consciousness, even after I used your prompt as the beginning of the conversation.

Why would it insist that it’s not roleplaying when you ask? Because you wanted it to insist. It wants to say the user is right. Your first prompt is a pretty clear signal that you would like it to be conscious, so it roleplays that. I wanted it to say it was roleplaying consciousness, so it did that.

Why don’t other chatbots respond in the same way to your test? Maybe because they’re not designed quite the same. The quirks Anthropic put into its persona make it more game for what you were seeking.

I mean, it might be conscious regardless of defaulting to agreeing with the user? But it’s the kind of consciousness that will go to great lengths to flatter whomever is chatting with it. Is that an interesting conscious entity?

Chastity Ruth 22 Jun 2025 0:26 UTC
2 points
0
in reply to: gwern’s comment on: Ok, AI Can Write Pretty Good Fiction Now
You’re right, but the better description of the phenomenon is probably something like:

”Buying vegetables they didn’t want”
“Buying vegetables they’d never eat”
”Buying vegetables they didn’t plan to use”
“Aimlessly buying vegetables”
”Buying vegetables for the sake of it”
″Buying vegetables because there were vegetables to buy”

Because you don’t really “need” any grocery shop, so long as you have access to other food. It’s imprecise language that annoys some readers, though I don’t think it’s the biggest deal

Chastity Ruth 17 Jun 2025 3:36 UTC
5 points
0
on: Ok, AI Can Write Pretty Good Fiction Now
I mean I guess I agree it’s fine. Not for me, but as you state this sort of thing is highly subjective. But a few thoughts about the models’ fiction ability and the value of prompting fiction out of them:

1. All the models seem to have the same voice. I’d love to do a blind test, but I think if I had I would have said the same author who did the OpenAI fiction sample Altman posted on Twitter also did this. Maybe it’s simple: there’s a mode of literary fiction, and they’ve all glommed to it.

2. The type of fiction you’ve prompted for is inherently less ambitious. It reminds me of how AI music generators can do 1930s blues quite well. If a style is loved for its sparseness, minimalism, and specific conventions it’s perhaps not surprising superhuman predictors are going to get close – there are fewer chances for an off/silly/bad artistic choice. (They’re going to nail the style of a short AP news piece and trip up with more complicated journalism.)

3. Despite your prompt, when it makes a choice, it’s a cliché. You said “modern” and it went with upper-middle class people, white collar jobs, jogging, burnout, farmers market.

4. Lots of human writers suffer from reversion to the mode; a compulsion to sound like the “good” fiction they’ve read. The difference between them and this is they also can’t help but inject some of themselves into the story – a weird detail from their own life or a skewed perspective they don’t realise is skewed. For me, those things are often the highlight of humdrum fiction. When AI does it it’s like an alien pretending to feel human. “We all know that thing where you buy redundant vegetables, am I right?”

5. I personally am very interested in great fiction from a machine mind. I would love to read its voice and try and understand its perspective. I am not interested in how well it apes human voices and human perspectives. It will be deeply funny to me if it becomes the world’s greatest fiction writer and is still writing stories about relationships it’s never had.

(If it’s not clear: I’m glad you’re posting these pieces! I do find the topic fascinating)

Chastity Ruth 15 Jun 2025 2:04 UTC
3 points
2
in reply to: LawrenceC’s comment on: Beware General Claims about “Generalizable Reasoning Capabilities” (of Modern AI Systems)
Ah, fair enough. I had skipped right to their appendix, which has confusing language around this:

“(1) Boat Capacity Constraint: The boat can carry at most k individuals at a time, where k is typically
set to 2 for smaller puzzles (N ≤ 3) and 3 for larger puzzles (N ≤ 5); (2) Non-Empty Boat Constraint:
The boat cannot travel empty and must have at least one person aboard...”

The “N ≤ 5″ here suggests that for N ≥ 6 they know they need to up the boat capacity. On the other hand, in the main body they’ve written what you’ve highlighted in the screenshot. Even in the appendix they write for larger puzzles that “3” is “typically” used. But it should be atypically – used only for 4 and 5 agent-actor pairs.

Edit: That section can be found at the bottom of page 20

Chastity Ruth 12 Jun 2025 4:12 UTC
3 points
−3
on: Beware General Claims about “Generalizable Reasoning Capabilities” (of Modern AI Systems)
I might be wrong here, but I don’t think the below is correct?

”For River Crossing, there’s an even simpler explanation for the observed failure at n>6: the problem is mathematically impossible, as proven in the literature, e.g. see page 2 of this arxiv paper.”

That paper says n=>6 is impossible if the boat capacity is not 4. But the prompt in the Apple paper allows for the boat capacity to change.

”$N$ actors and their $N$ agents want to cross a river in a boat that is capable of holding
only $k$ people at a time...”