StanislavKrym comments on the gears to ascenscion’s Shortform

StanislavKrym 3 Dec 2025 14:20 UTC
3 points
0
sufficiently low effort posts can be revealed to be broken by handing them to an AI with a good breaker prompt, and having an AI do this in the comments of goofy posts would be my preference;
An example of this is Adding Empathy as a Tool for LLMs.^[1] I made two comments on it, then asked ChatGPT what it thinks about the post, then about my comments. It identified the weaknesses of the post and mostly agreed with my comments (which it also claimed to fail to see). As a control, I also asked Claude Sonnet 4.5 and had it make the following erroneous criticism:
Claude’s erroneous criticism
The proposal fundamentally assumes what it’s trying to solve. The author wants to use “Empathizer-001” (a model trained to represent human empathy) as a filter to prevent misaligned behavior. But this only works if Empathizer-001 itself is already aligned—which is the original problem we’re trying to solve. If we could reliably train a model that accurately represents human values and empathy, we’d just use that as our AI system.
Empathizer-001 was meant to be about as harmless as Agent-2 from the AI-2027 scenario.
As for “otherwise high-quality posts that are just prompt-and-collapsed-output”, we would also need to avoid armies of slop writers who just ask an AI prompts and post the answer without even realizing that they posted slop. Maybe one could grant this ability only to experienced users? See, e.g. the most recent moderator comment which explicitly says “we would have rejected this post if it had come from a new user (this doesn’t mean the core ideas is bad, indeed I find this post useful, but I do really think the attractor of everyone pasting content like this is a much worse attractor than the one we are currently in)”.
P.S. For reference, the current LW justifications for automated (or quasi-automated?) rejection look like this:
LW’s reasoning for automated rejection
This is an automated rejection. No LLM generated, heavily assisted/co-written, or otherwise reliant work. An LLM-detection service flagged your post as >50% likely to be written by an LLM. We’ve been having a wave of LLM written (italics mine—S.K.) or co-written work that doesn’t meet our quality standards. LessWrong has fairly specific standards, and your first LessWrong post is sort of like the application to a college. It should be optimized for demonstrating that you can think clearly without AI assistance.
So, we reject all LLM generated posts from new users. We also reject work that falls into some categories that are difficult to evaluate that typically turn out to not make much sense, which LLMs frequently steer people toward.*
<...> if all 3 of the following criteria are true, you can message us on Intercom or at team@lesswrong.com and ask for reconsideration.
you wrote this yourself (not using LLMs to help you write it)
you did not chat extensively with LLMs to help you generate the ideas. (using it briefly the way you’d use a search engine is fine. But, if you’re treating it more like a coauthor or test subject, we will not reconsider your post)
your post is not about AI consciousness^[2]/recursion/emergence, or novel interpretations of physics.
If any of those are false, sorry, we will not accept your post.
* (examples of work we don’t evaluate because it’s too time costly: case studies of LLM sentience, emergence, recursion, novel physics interpretations, or AI alignment strategies that you developed in tandem with an AI coauthor – AIs may seem quite smart but they aren’t actually a good judge of the quality of novel ideas.)
LW’s reason to reject another comment
No LLM generated, heavily assisted/co-written, or otherwise reliant work. LessWrong has recently been inundated with new users submitting work where much of the content is the output of LLM(s). This work by-and-large does not meet our standards, and is rejected. This includes dialogs with LLMs that claim to demonstrate various properties about them, posts introducing some new concept and terminology that explains how LLMs work, often centered around recursiveness, emergence, sentience, consciousness, etc. (these generally don’t turn out to be as novel or interesting as they may seem).
Our LLM-generated content policy can be viewed here.
Insufficient Quality for AI Content. There’ve been a lot of new users coming to LessWrong recently interested in AI. To keep the site’s quality high and ensure stuff posted is interesting to the site’s users, we’re currently only accepting posts that meet a pretty high bar.
If you want to try again, I recommend writing something short and to the point, focusing on your strongest argument, rather than a long, comprehensive essay. (This is fairly different from common academic norms.) We get lots of AI essays/papers every day and sadly most of them don’t make very clear arguments, and we don’t have time to review them all thoroughly.
We look for good reasoning, making a new and interesting point, bringing new evidence, and/or building upon prior discussion. If you were rejected for this reason, possibly a good thing to do is read more existing material. The AI Intro Material wiki-tag is a good place, for example.
No Basic LLM Case Studies. We get lots of new users submitting case studies of conversations with LLMs, prompting them into different modalities. We reject these because:
The content is almost always very similar.
Usually, the user is incorrect about how novel/interesting their case study is (i.e. it’s pretty easy to get LLMs into various modes of conversation or apparent awareness/emergence, and not actually strong evidence of anything interesting)
Most of these situations seem like they are an instance of Parasitic AI.
We haven’t necessarily reviewed your case in detail but since we get multiple of these per day, alas, we don’t have time to do so.
LW on LLM sycophancy traps
- Writing seems likely in a “LLM sycophancy trap”. Since early 2025, we’ve been seeing a wave of users who seem to have fallen into a pattern where, because the LLM has infinite patience and enthusiasm for whatever the user is interested in, they think their work is more interesting and useful than it actually is.
  We unfortunately get too many of these to respond individually to, and while this is a bit/rude and sad, it seems better to say explicitly: it probably is best for you to stop talking much to LLMs and instead talk about your ideas with some real humans in your life who can. (See this post for more thoughts).
  Generally, the ideas presented in these posts are not, like, a few steps away from being publishable on LessWrong, they’re just not really on the right track. If you want to contribute on LessWrong or to AI discourse, I recommend starting over and and focusing on much smaller, more specific questions, about things other than language model chats or deep physics or metaphysics theories (consider writing Fact Posts that focus on concrete of a very different domain).
  I recommend reading the Sequence Highlights, if you haven’t already, to get a sense of the background knowledge we assume about “how to reason well” on LessWrong.
1. ^
  I rather frequently break down various rather sloppy posts or the ones where I believe the post to be clearly flawed, like this, this and this.
2. ^
  S.K.’s footnote: However, we had Kaj Sotala’s post consisting of asking Claude Opus 4.5 to introspect about its own consciousness.