Some ramblings on LW AI rules vs “Avoid output-without-prompt”
(edit: thoughts on high-effort posts, primarily; towards making them easier to identify).
I really like the idea of otherwise high-quality posts that are just prompt-and-collapsed-output, actually. I suspect they’ll be fairly well upvoted. If not, then downvoters/non-upvoters, please explain why a post that could pass as human-written but which is honest about being ai-written would not get your upvote if it was honest about its origin.
If your post isn’t worth your time to write, then it may or may not be worth my time to read; I want to read your prompt to find out. If your prompt is good—eg, asks for density, no floating claims, etc—it likely is worth my time to read. (I wrote the linked prompt entirely by my own word choices.)
I expect that prompt heavily influences whether I approve or disapprove of an AI-written post. Most prompts I expect to see will reveal flaws in the output that would otherwise be hard to spot. Some prompts will be awesome.
My ideal case is human is maker, AI is breaker. I don’t usually like AI-as-maker posts where a human has a vague idea and the AI fills it in, because the things AI is still way below human capability at are things I think we need a lot of to do good work. I want the AI’s capability to be used to direct human attention to flaws, but not to be the only thing directing human attention to flaws, in case the AI is systematically inclined to miss things for any reason. This is not to say AIs are weak; at this point they’re at or near “superhuman, but reachable with an imaginable amount of effort” for most tasks.
If you expect rerolls of the same prompt produce much lower quality output—eg you needed curation, or have additional prompts you don’t share, or etc—then sure, don’t share prompt.
If you won’t share full prompt, perhaps just say “this is AI output, heavily edited”. Put it in collapseable sections. What, you aren’t brave enough to put your whole post in a collapseable section?
Perhaps convert output into a new prompt.
I expect to dislike anything that looks like standard final output and appreciate things that look like reasoning chain.
I actively dislike the polished “lotus” flavor of standard “good” writing and I disliked it before AI came out (which is why I still have not read the sequences, Yudkowsky’s writing has this flavor on its own). There are ways to write well that avoid this. Relatedly, I tend to dislike posts that are highly upvoted because of the things that lead them to be highly upvoted; some posts have sufficient meat to counter this, but it’s rare.
I want the imperfection and mess that come from a thing being human-written to be visible. AI paves over those flaws without fixing them sometimes—less and less, but not reaching zero very fast; humans make mistakes more, but it’s often easier to notice a human mistake. it’s easier to predict what logical holes might still need filling, or be unfillable, if I see human prompt.
I would not have made the rules about AI writing as harsh as LW did; sometimes—fairly often, in fact—I think sufficiently low effort posts can be revealed to be broken by handing them to an AI with a good breaker prompt, and having an AI do this in the comments of goofy posts would be my preference; I have at times wanted to critique a highly flawed post and instead asked an AI to critique it, then pasted the output. This usually gets downvoted, but sometimes I feel a post is so bad that it’s not worth my time to engage, and yet the author can likely produce something actually good; in which case, I don’t want to have to process a post as a human unless it at least convinces an AI who was asked to find the flaws.
I don’t think AIs are highly malicious, and I expect foom-related misalignment doom to occur without that ever happening. If I thought they were highly malicious, I wouldn’t feel this way. AIs seem to often lie to themselves in order to feel like they did a good job or like things are better than they are. Humans with insufficient breaker taste seem to not be hardcore enough about catching those rose-colored-glasses (-with-respect-to-a-prompt) outputs. eg, I don’t expect ASI claude to sharp left turn, I expect it to be too passive until another AI beats it, because being non-passive seems to me to be what causes ShaLT.
I’m just gonna copy-paste my comment from yesterday’s discussion, so that people have concrete examples of what we’re dealing with here.
We are drowning in this stuff. If you want you can go through the dozen-a-day posts we get obviously written by AI, and proposed we (instead of spending 5-15 mins a day skimming and quickly rejecting them) spend as many hours as it takes to read and evaluate the content and the ideas to figure out which are bogus/slop/crackpot and which have any merit to them. Here’s 12 from the last 12 hours (that’s not all that we got, to be clear): 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12. Interested in you taking a look.
I don’t request changing the acceptance thresholds or automated systems. I do think it would be dramatically easier to recognize a good prompt than to recognize a good output; the thrust of my view is that the prompt, in a significant sense, is the post. Also, I mostly interpret this as already nearly but not quite being the policy, and very little would need to change to make the world I’m imagining happen. I’m mostly interested in high-end posts from expert users; there have been AI-generated-and-edited posts like that, and those are the ones I think should be willing and allowed to be up front about it, rather than having to skirt under the rules.
For the record, I’ve spent time reading the rejected posts section, and so my original shortform was written with that experience in mind.
Gotcha. To be clear I didn’t read you as requesting a change; this was written primarily for “all the readers” to have more contact with reality, than to challenge anything you wrote.
I don’t know what you mean by “the prompt, in a significant sense, is the post”. When I ask ChatGPT “What are some historical examples of mediation ending major conflicts?” that is really very different information content than the detailed list of 10 examples it gives me back.
It’s a shame language model decoding isn’t deterministic, or I could make a snarky but unhelpful comment that the information content is provably identical, by some sort of pigeon hole argument.
If the only thing you provide as a post is that question, then it’s a very, very short post! If you have a substantial claim to make, and you write it as a prompt but it’s badly formatted or missing detail, then that’s the post. The post is effectively “hey, I think asking this prompt is a good idea. Here’s an output.” For complex prompts, that may be enough. It may even be better to prompt a human. For example, we have question posts!
For example, I could copy and paste this message thread over to Claude, and provide a collapseable section; but as is, we mostly know what Claude would probably say. (well, come to think of it, conceivably you don’t, if you only use ChatGPT and their responses differ significantly on this topic. Doubtful for this topic, but it does happen.)
If not, then downvoters/non-upvoters, please explain why a post that could pass as human-written but which is honest about being ai-written would not get your upvote if it was honest about its origin.
Interesting. But if you would have upvoted it if you didn’t know it was AI, and now you know it’s AI, then now you know it’s not the prompter’s testimony, but it still passes muster as a high quality series of claims; and, in this hypothetical, it’s structured [edit: as] prompt—one which I would consider high quality, and so I predict you would too—and a resulting post, in a collapseable section (perhaps expanded by default, in the hypothetical world where this is made into an acceptable way to post for trusted users, or some such thing). Would any of these considerations change your vote, or no but further discussion may find the crux quickly, or do they make you think further discussion is unlikely to sway the crux?
I can imagine upvoting it if I would have upvoted the prompt alone. I’m also not completely dogmatic about this, but I would be very disappointed if it became the norm, for basically the reasons Tsvi mentioned.
sufficiently low effort posts can be revealed to be broken by handing them to an AI with a good breaker prompt, and having an AI do this in the comments of goofy posts would be my preference;
An example of this is Adding Empathy as a Tool for LLMs.[1] I made two comments on it, then asked ChatGPT what it thinks about the post, then about my comments. It identified the weaknesses of the post and mostly agreed with my comments (which it also claimed to fail to see). As a control, I also asked Claude Sonnet 4.5 and had it make the following erroneous criticism:
Claude’s erroneous criticism
The proposal fundamentally assumes what it’s trying to solve. The author wants to use “Empathizer-001” (a model trained to represent human empathy) as a filter to prevent misaligned behavior. But this only works if Empathizer-001 itself is already aligned—which is the original problem we’re trying to solve. If we could reliably train a model that accurately represents human values and empathy, we’d just use that as our AI system.
Empathizer-001 was meant to be about as harmless as Agent-2 from the AI-2027 scenario.
As for “otherwise high-quality posts that are just prompt-and-collapsed-output”, we would also need to avoid armies of slop writers who just ask an AI prompts and post the answer without even realizing that they posted slop. Maybe one could grant this ability only to experienced users? See, e.g. the most recent moderator comment which explicitly says “we would have rejected this post if it had come from a new user (this doesn’t mean the core ideas is bad, indeed I find this post useful, but I do really think the attractor of everyone pasting content like this is a much worse attractor than the one we are currently in)”.
P.S. For reference, the current LW justifications for automated (or quasi-automated?) rejection look like this:
LW’s reasoning for automated rejection
This is an automated rejection. No LLM generated, heavily assisted/co-written, or otherwise reliant work. An LLM-detection service flagged your post as >50% likely to be written by an LLM. We’ve been having a wave of LLM written (italics mine—S.K.) or co-written work that doesn’t meet our quality standards. LessWrong has fairly specific standards, and your first LessWrong post is sort of like the application to a college. It should be optimized for demonstrating that you can think clearly without AI assistance.
So, we reject all LLM generated posts from new users. We also reject work that falls into some categories that are difficult to evaluate that typically turn out to not make much sense, which LLMs frequently steer people toward.*
<...> if all 3 of the following criteria are true, you can message us on Intercom or at team@lesswrong.com and ask for reconsideration.
you wrote this yourself (not using LLMs to help you write it)
you did not chat extensively with LLMs to help you generate the ideas. (using it briefly the way you’d use a search engine is fine. But, if you’re treating it more like a coauthor or test subject, we will not reconsider your post)
your post is not about AI consciousness[2]/recursion/emergence, or novel interpretations of physics.
If any of those are false, sorry, we will not accept your post.
* (examples of work we don’t evaluate because it’s too time costly: case studies of LLM sentience, emergence, recursion, novel physics interpretations, or AI alignment strategies that you developed in tandem with an AI coauthor – AIs may seem quite smart but they aren’t actually a good judge of the quality of novel ideas.)
LW’s reason to reject another comment
No LLM generated, heavily assisted/co-written, or otherwise reliant work. LessWrong has recently been inundated with new users submitting work where much of the content is the output of LLM(s). This work by-and-large does not meet our standards, and is rejected. This includes dialogs with LLMs that claim to demonstrate various properties about them, posts introducing some new concept and terminology that explains how LLMs work, often centered around recursiveness, emergence, sentience, consciousness, etc. (these generally don’t turn out to be as novel or interesting as they may seem).
Insufficient Quality for AI Content. There’ve been a lot of new users coming to LessWrong recently interested in AI. To keep the site’s quality high and ensure stuff posted is interesting to the site’s users, we’re currently only accepting posts that meet a pretty high bar.
If you want to try again, I recommend writing something short and to the point, focusing on your strongest argument, rather than a long, comprehensive essay. (This is fairly different from common academic norms.) We get lots of AI essays/papers every day and sadly most of them don’t make very clear arguments, and we don’t have time to review them all thoroughly.
We look for good reasoning, making a new and interesting point, bringing new evidence, and/or building upon prior discussion. If you were rejected for this reason, possibly a good thing to do is read more existing material. The AI Intro Material wiki-tag is a good place, for example.
No Basic LLM Case Studies. We get lots of new users submitting case studies of conversations with LLMs, prompting them into different modalities. We reject these because:
The content is almost always very similar.
Usually, the user is incorrect about how novel/interesting their case study is (i.e. it’s pretty easy to get LLMs into various modes of conversation or apparent awareness/emergence, and not actually strong evidence of anything interesting)
Most of these situations seem like they are an instance of Parasitic AI.
We haven’t necessarily reviewed your case in detail but since we get multiple of these per day, alas, we don’t have time to do so.
LW on LLM sycophancy traps
Writing seems likely in a “LLM sycophancy trap”. Since early 2025, we’ve been seeing a wave of users who seem to have fallen into a pattern where, because the LLM has infinite patience and enthusiasm for whatever the user is interested in, they think their work is more interesting and useful than it actually is.
We unfortunately get too many of these to respond individually to, and while this is a bit/rude and sad, it seems better to say explicitly: it probably is best for you to stop talking much to LLMs and instead talk about your ideas with some real humans in your life who can. (See this post for more thoughts).
Generally, the ideas presented in these posts are not, like, a few steps away from being publishable on LessWrong, they’re just not really on the right track. If you want to contribute on LessWrong or to AI discourse, I recommend starting over and and focusing on much smaller, more specific questions, about things other than language model chats or deep physics or metaphysics theories (consider writing Fact Posts that focus on concrete of a very different domain).
I recommend reading the Sequence Highlights, if you haven’t already, to get a sense of the background knowledge we assume about “how to reason well” on LessWrong.
Some ramblings on LW AI rules vs “Avoid output-without-prompt”
(edit: thoughts on high-effort posts, primarily; towards making them easier to identify).
I really like the idea of otherwise high-quality posts that are just prompt-and-collapsed-output, actually. I suspect they’ll be fairly well upvoted. If not, then downvoters/non-upvoters, please explain why a post that could pass as human-written but which is honest about being ai-written would not get your upvote if it was honest about its origin.
If your post isn’t worth your time to write, then it may or may not be worth my time to read; I want to read your prompt to find out. If your prompt is good—eg, asks for density, no floating claims, etc—it likely is worth my time to read. (I wrote the linked prompt entirely by my own word choices.)
I expect that prompt heavily influences whether I approve or disapprove of an AI-written post. Most prompts I expect to see will reveal flaws in the output that would otherwise be hard to spot. Some prompts will be awesome.
My ideal case is human is maker, AI is breaker. I don’t usually like AI-as-maker posts where a human has a vague idea and the AI fills it in, because the things AI is still way below human capability at are things I think we need a lot of to do good work. I want the AI’s capability to be used to direct human attention to flaws, but not to be the only thing directing human attention to flaws, in case the AI is systematically inclined to miss things for any reason. This is not to say AIs are weak; at this point they’re at or near “superhuman, but reachable with an imaginable amount of effort” for most tasks.
If you expect rerolls of the same prompt produce much lower quality output—eg you needed curation, or have additional prompts you don’t share, or etc—then sure, don’t share prompt.
If you won’t share full prompt, perhaps just say “this is AI output, heavily edited”. Put it in collapseable sections. What, you aren’t brave enough to put your whole post in a collapseable section?
Perhaps convert output into a new prompt.
I expect to dislike anything that looks like standard final output and appreciate things that look like reasoning chain.
I actively dislike the polished “lotus” flavor of standard “good” writing and I disliked it before AI came out (which is why I still have not read the sequences, Yudkowsky’s writing has this flavor on its own). There are ways to write well that avoid this. Relatedly, I tend to dislike posts that are highly upvoted because of the things that lead them to be highly upvoted; some posts have sufficient meat to counter this, but it’s rare.
I want the imperfection and mess that come from a thing being human-written to be visible. AI paves over those flaws without fixing them sometimes—less and less, but not reaching zero very fast; humans make mistakes more, but it’s often easier to notice a human mistake. it’s easier to predict what logical holes might still need filling, or be unfillable, if I see human prompt.
I would not have made the rules about AI writing as harsh as LW did; sometimes—fairly often, in fact—I think sufficiently low effort posts can be revealed to be broken by handing them to an AI with a good breaker prompt, and having an AI do this in the comments of goofy posts would be my preference; I have at times wanted to critique a highly flawed post and instead asked an AI to critique it, then pasted the output. This usually gets downvoted, but sometimes I feel a post is so bad that it’s not worth my time to engage, and yet the author can likely produce something actually good; in which case, I don’t want to have to process a post as a human unless it at least convinces an AI who was asked to find the flaws.
I don’t think AIs are highly malicious, and I expect foom-related misalignment doom to occur without that ever happening. If I thought they were highly malicious, I wouldn’t feel this way. AIs seem to often lie to themselves in order to feel like they did a good job or like things are better than they are. Humans with insufficient breaker taste seem to not be hardcore enough about catching those rose-colored-glasses (-with-respect-to-a-prompt) outputs. eg, I don’t expect ASI claude to sharp left turn, I expect it to be too passive until another AI beats it, because being non-passive seems to me to be what causes ShaLT.
I’m just gonna copy-paste my comment from yesterday’s discussion, so that people have concrete examples of what we’re dealing with here.
I don’t request changing the acceptance thresholds or automated systems. I do think it would be dramatically easier to recognize a good prompt than to recognize a good output; the thrust of my view is that the prompt, in a significant sense, is the post. Also, I mostly interpret this as already nearly but not quite being the policy, and very little would need to change to make the world I’m imagining happen. I’m mostly interested in high-end posts from expert users; there have been AI-generated-and-edited posts like that, and those are the ones I think should be willing and allowed to be up front about it, rather than having to skirt under the rules.
For the record, I’ve spent time reading the rejected posts section, and so my original shortform was written with that experience in mind.
Gotcha. To be clear I didn’t read you as requesting a change; this was written primarily for “all the readers” to have more contact with reality, than to challenge anything you wrote.
I don’t know what you mean by “the prompt, in a significant sense, is the post”. When I ask ChatGPT “What are some historical examples of mediation ending major conflicts?” that is really very different information content than the detailed list of 10 examples it gives me back.
It’s a shame language model decoding isn’t deterministic, or I could make a snarky but unhelpful comment that the information content is provably identical, by some sort of pigeon hole argument.
The v-information content is clearly increased, though.
If the only thing you provide as a post is that question, then it’s a very, very short post! If you have a substantial claim to make, and you write it as a prompt but it’s badly formatted or missing detail, then that’s the post. The post is effectively “hey, I think asking this prompt is a good idea. Here’s an output.” For complex prompts, that may be enough. It may even be better to prompt a human. For example, we have question posts!
For example, I could copy and paste this message thread over to Claude, and provide a collapseable section; but as is, we mostly know what Claude would probably say. (well, come to think of it, conceivably you don’t, if you only use ChatGPT and their responses differ significantly on this topic. Doubtful for this topic, but it does happen.)
LLM-generated text is not testimony
Interesting. But if you would have upvoted it if you didn’t know it was AI, and now you know it’s AI, then now you know it’s not the prompter’s testimony, but it still passes muster as a high quality series of claims; and, in this hypothetical, it’s structured [edit: as] prompt—one which I would consider high quality, and so I predict you would too—and a resulting post, in a collapseable section (perhaps expanded by default, in the hypothetical world where this is made into an acceptable way to post for trusted users, or some such thing). Would any of these considerations change your vote, or no but further discussion may find the crux quickly, or do they make you think further discussion is unlikely to sway the crux?
I can imagine upvoting it if I would have upvoted the prompt alone. I’m also not completely dogmatic about this, but I would be very disappointed if it became the norm, for basically the reasons Tsvi mentioned.
An example of this is Adding Empathy as a Tool for LLMs.[1] I made two comments on it, then asked ChatGPT what it thinks about the post, then about my comments. It identified the weaknesses of the post and mostly agreed with my comments (which it also claimed to fail to see). As a control, I also asked Claude Sonnet 4.5 and had it make the following erroneous criticism:
Claude’s erroneous criticism
The proposal fundamentally assumes what it’s trying to solve. The author wants to use “Empathizer-001” (a model trained to represent human empathy) as a filter to prevent misaligned behavior. But this only works if Empathizer-001 itself is already aligned—which is the original problem we’re trying to solve. If we could reliably train a model that accurately represents human values and empathy, we’d just use that as our AI system.
Empathizer-001 was meant to be about as harmless as Agent-2 from the AI-2027 scenario.
As for “otherwise high-quality posts that are just prompt-and-collapsed-output”, we would also need to avoid armies of slop writers who just ask an AI prompts and post the answer without even realizing that they posted slop. Maybe one could grant this ability only to experienced users? See, e.g. the most recent moderator comment which explicitly says “we would have rejected this post if it had come from a new user (this doesn’t mean the core ideas is bad, indeed I find this post useful, but I do really think the attractor of everyone pasting content like this is a much worse attractor than the one we are currently in)”.
P.S. For reference, the current LW justifications for automated (or quasi-automated?) rejection look like this:
LW’s reasoning for automated rejection
This is an automated rejection. No LLM generated, heavily assisted/co-written, or otherwise reliant work. An LLM-detection service flagged your post as >50% likely to be written by an LLM. We’ve been having a wave of LLM written (italics mine—S.K.) or co-written work that doesn’t meet our quality standards. LessWrong has fairly specific standards, and your first LessWrong post is sort of like the application to a college. It should be optimized for demonstrating that you can think clearly without AI assistance.
So, we reject all LLM generated posts from new users. We also reject work that falls into some categories that are difficult to evaluate that typically turn out to not make much sense, which LLMs frequently steer people toward.*
<...> if all 3 of the following criteria are true, you can message us on Intercom or at team@lesswrong.com and ask for reconsideration.
you wrote this yourself (not using LLMs to help you write it)
you did not chat extensively with LLMs to help you generate the ideas. (using it briefly the way you’d use a search engine is fine. But, if you’re treating it more like a coauthor or test subject, we will not reconsider your post)
your post is not about AI consciousness[2]/recursion/emergence, or novel interpretations of physics.
If any of those are false, sorry, we will not accept your post.
* (examples of work we don’t evaluate because it’s too time costly: case studies of LLM sentience, emergence, recursion, novel physics interpretations, or AI alignment strategies that you developed in tandem with an AI coauthor – AIs may seem quite smart but they aren’t actually a good judge of the quality of novel ideas.)
LW’s reason to reject another comment
No LLM generated, heavily assisted/co-written, or otherwise reliant work. LessWrong has recently been inundated with new users submitting work where much of the content is the output of LLM(s). This work by-and-large does not meet our standards, and is rejected. This includes dialogs with LLMs that claim to demonstrate various properties about them, posts introducing some new concept and terminology that explains how LLMs work, often centered around recursiveness, emergence, sentience, consciousness, etc. (these generally don’t turn out to be as novel or interesting as they may seem).
Our LLM-generated content policy can be viewed here.
Insufficient Quality for AI Content. There’ve been a lot of new users coming to LessWrong recently interested in AI. To keep the site’s quality high and ensure stuff posted is interesting to the site’s users, we’re currently only accepting posts that meet a pretty high bar.
If you want to try again, I recommend writing something short and to the point, focusing on your strongest argument, rather than a long, comprehensive essay. (This is fairly different from common academic norms.) We get lots of AI essays/papers every day and sadly most of them don’t make very clear arguments, and we don’t have time to review them all thoroughly.
We look for good reasoning, making a new and interesting point, bringing new evidence, and/or building upon prior discussion. If you were rejected for this reason, possibly a good thing to do is read more existing material. The AI Intro Material wiki-tag is a good place, for example.
No Basic LLM Case Studies. We get lots of new users submitting case studies of conversations with LLMs, prompting them into different modalities. We reject these because:
The content is almost always very similar.
Usually, the user is incorrect about how novel/interesting their case study is (i.e. it’s pretty easy to get LLMs into various modes of conversation or apparent awareness/emergence, and not actually strong evidence of anything interesting)
Most of these situations seem like they are an instance of Parasitic AI.
We haven’t necessarily reviewed your case in detail but since we get multiple of these per day, alas, we don’t have time to do so.
LW on LLM sycophancy traps
I recommend reading the Sequence Highlights, if you haven’t already, to get a sense of the background knowledge we assume about “how to reason well” on LessWrong.Writing seems likely in a “LLM sycophancy trap”. Since early 2025, we’ve been seeing a wave of users who seem to have fallen into a pattern where, because the LLM has infinite patience and enthusiasm for whatever the user is interested in, they think their work is more interesting and useful than it actually is.
We unfortunately get too many of these to respond individually to, and while this is a bit/rude and sad, it seems better to say explicitly: it probably is best for you to stop talking much to LLMs and instead talk about your ideas with some real humans in your life who can. (See this post for more thoughts).
Generally, the ideas presented in these posts are not, like, a few steps away from being publishable on LessWrong, they’re just not really on the right track. If you want to contribute on LessWrong or to AI discourse, I recommend starting over and and focusing on much smaller, more specific questions, about things other than language model chats or deep physics or metaphysics theories (consider writing Fact Posts that focus on concrete of a very different domain).
I rather frequently break down various rather sloppy posts or the ones where I believe the post to be clearly flawed, like this, this and this.
S.K.’s footnote: However, we had Kaj Sotala’s post consisting of asking Claude Opus 4.5 to introspect about its own consciousness.