New LessWrong Editor! (Also, an update to our LLM policy.)
There’s a new editor experience on LessWrong! A bunch of the editor page has been rearranged to make it much more WYSIWYG compared to published post pages. All of the settings live in panels that are hidden by default and can be opened up by clicking the relevant buttons on the side of the screen. We also adopted lexical as a new editor framework powering everything behind the scenes (we were previously using ckEditor).
That scary arrow button in the top-left doesn’t publish your post! It just opens the publishing menu.
Posts[1] now have automatic real-time autosave while you’re online (like Google Docs), but still support offline editing if your connection drops out. Point-in-time revisions will still get autosaved periodically, and you can always manually save your draft if you want a specific checkpoint.
The editor also has a slash menu now!
Good for all many of your custom content needs!
You might be eyeing the last two items in that slash menu. This post will demo some of the new features, and I’ll demo two of them simultaneously by letting Opus 4.6 explain what they are:
Hi! I’m Claude, and I’m writing this from inside the post you’re reading right now. This block I’m in is one of the new editor features — let me walk you through a few of them.
LLM Content Blocks
This visually distinct block is an LLM Content Block. Authors can insert these into their posts to clearly attribute a section to a specific AI model. The block header shows which model generated the content, so readers always know what they’re looking at. It’s a way to be transparent about AI-assisted writing while keeping everything in one document.
Custom Iframe Widgets
The new editor supports custom interactive widgets embedded directly in posts. Authors can write HTML and JavaScript that runs in a sandboxed iframe right in the document — useful for interactive demos, visualizations, small tools, or anything else that benefits from being more than static text. There’s one just below this block, in fact.
Agent Integration
The editor now has an API that lets AI agents read and edit drafts collaboratively. If you share your draft’s edit link with an AI assistant (like me), it can insert text, leave Google Docs-style comments, make suggested edits, and add LLM content blocks and widgets — all showing up live in the editor. That’s how this entire block was written: not copy-pasted in, but inserted directly through the API while the post was open for editing.
To use it, open your post’s sharing settings and set “Anyone with the link can” to Edit, then copy the edit URL and share it with your AI assistant.
With Edit permissions, the agent can do everything: insert and modify text, add widgets, create LLM content blocks, and more. If you’d prefer to keep tighter control, Comment permissions still allow the agent to leave inline comments and suggested edits, which you can accept or reject individually.
Setup depends on which AI tool you’re using. Agent harnesses that can make HTTP requests directly — like Claude Code, Codex, or Cursor — should work out of the box. If you’re using Claude on claude.ai, you’ll need to add www.lesswrong.com to your allowed domains settings, then start a new chat. (The ChatGPT web UI doesn’t currently support whitelisting external domains, so it can’t be used for this feature yet.) Once that’s done, just paste your edit URL and ask Claude to read the post — the API is self-describing, so it’ll figure out the rest from there.
And here’s a small interactive widget, also written by Claude[2], to demonstrate custom iframe widgets:
Policy on LLM Use
You might be wondering what this means for our policy on LLM use.
Our initial policy was this:
A rough guideline is that if you are using AI for writing assistance, you should spend a minimum of 1 minute per 50 words (enough to read the content several times and perform significant edits), you should not include any information that you can’t verify, haven’t verified, or don’t understand, and you should not use the stereotypical writing style of an AI assistant.
You were also permitted to put LLM-generated content into collapsible sections, if you labeled it as LLM-generated.
In practice, the “you should not use the stereotypical writing style of an AI assistant” part of the requirement meant that this was a de-facto ban on LLM use, which we enforced mostly consistently on new users and very inconsistently on existing users[3]. Bad!
To motivate our updated policy, we must first do some philosophy. Why do we care about knowing whether something we’re reading was generated by an LLM? LLM-generated text is not testimony has substantially informed my thinking on this question. Take the synopsis:
When we share words with each other, we don’t only care about the words themselves. We care also—even primarily—about the mental elements of the human mind/agency that produced the words. What we want to engage with is those mental elements.
As of 2025, LLM text does not have those elements behind it.
Therefore LLM text categorically does not serve the role for communication that is served by real text.
Therefore the norm should be that you don’t share LLM text as if someone wrote it. And, it is inadvisable to read LLM text that someone else shares as though someone wrote it.
I don’t think you even need to confidently believe in point 2[4] for the norm in point 4 to be compelling. It is merely sufficient that someone else produced the text.
Plagiarism is often considered bad because it’s “stealing credit” for someone else’s work. But it’s also bad because it’s misinforming your readers about your beliefs and mental models! What happens if someone asks you why you’re so confident about [proposition X]? It really sucks if the answer is “Oh, uh, I didn’t write that sentence, and re-reading it, it turns out I’m not actually that confident in that claim...”
This is also why having LLMs “edit” your writing is often pernicious. LLM editing, unless managed extremely carefully, often involves rephrasings, added qualifiers, and swapped vocabulary in ways that meaningfully change the semantic content of your writing. Very often this is in unendorsed ways, but this can be hard to pick up on because the typical LLM writing style has a tendency to make people’s eyes slide off of it[5].
With all that in mind, our new policy is this:
“LLM output” includes all of:
text written entirely by an LLM
text that was written by a human and then substantially[6] edited or revised by an LLM
text that was written by an LLM and then edited or revised by a human
“LLM output” does not include:
text that was written by a human and then lightly edited or revised by an LLM
text written by a human, which includes facts, arguments, examples, etc, which were researched/discovered/developed with LLM assistance. (If you “borrow language” from the LLM, that no longer counts as “text written by a human”.)
code (either in code blocks or in the new widgets)
“LLM output” must go into the new LLM content blocks. You can put “LLM output” into a collapsible section without wrapping it in an LLM content block if all of the content is “LLM output”. If it’s mixed, you should use LLM content blocks within the collapsible section to demarcate those parts which are “LLM output”.
We are going to be more strictly enforcing the “no LLM output” rule by normalizing our auto-moderation logic to treat posts by approved[7] users similarly to posts by new users—that is, they’ll be automatically rejected if they score above a certain threshold in our automated LLM content detection pipeline. Having spent a few months staring at what’s been coming down the pipe, we are also going to be lowering that threshold.
This does not change our existing quality bar for new user submissions. If you are a new user and submit a post that substantially consists of content inside of LLM content blocks, it is pretty unlikely that it will get approved[8]. This does not suddenly become wise if you’re an approved user. If you’re confident that people will want to read it, then sure, go ahead, but please pay close attention to the kind of feedback you get (karma, comments, etc), and if this proves noisy we’ll probably just tell people to cut it out.
As always, please submit feedback, questions, and bug reports via Intercom (or in the comments below, if you prefer).
- ^
Not comments or other content types that use the editor, like tags—those still have the same local backup mechanism they’ve always had, and you can still explicitly save draft comments, but none of them get automatically synced to the cloud as you type. Also, existing posts and drafts will continue to use the previous editor, and won’t have access to the new features.
- ^
Prompted by @jimrandomh.
- ^
For somewhat contingent reasons involving various choices we made with our moderation setup.
- ^
See my curation notice on that post for some additional thoughts and caveats.
- ^
I think this recent thread is instructive.
- ^
We’ll know it when we see it.
- ^
In the ontology of our codebase, a term which means “users whose content goes live without further review by the admins”, which is not true of users who haven’t posted or commented before, and is also not true of a smaller number of users who have.
- ^
I’m sure the people reading this will be able to conjure up some edge cases and counterexamples; go on, have fun.
My somewhat-dissenting-mod-opinion:
I feel more worried than I think Habryka and Robert are about LLM-content corroding LW culture, and I think I maybe have a slightly different take on why LLM-generated text is not testimony matters. (For me, it’s not the most important thing that people are making ‘I’ statements that are false if an LLM said them. More significant to me is the implicit vouching that each statement is interesting and is some kind of worthwhile piece of a broader conversation, whether it’s your opinion or not)
If I were making the policy and new features, I’d be framing it like:
I resonated with Justis’ Don’t Let LLMs Write for You, including the comments where some people pushed back and were like “but, I mean, clearly letting them write for you a bit is reasonable” and Justis responding “okay, but, like, the vast majority of people who think they are skilled enough to do this are making a mistake.” Yeah there is some nuance here, but, I think this makes more sense as a default recommendation.
Mechanistically, the main thing I’d change is make the LLM-blocks look more like “blockquotes” than “slightly different paragraphs”, and which an implicated cultural nudge of “LLM text is not main body text, it’s a thing you can quote, not a thing half-paying-attention-at-lunch people who are skimming should risk misinterpreting as part of the main text.”
(caveat: I think processes like Neel’s that involved a lot of LLM editing but begin and end with a lot of human involvement are fine, in particular if the end result isn’t distinguishable).
There are downsides to this habryka is more worried about, and I think I might change my mind later when LLM writing actually improves, but I don’t think we’re there yet and it doesn’t make sense to pre-emptively pave the way for LLM-assisted-writing-world.
I’d like to agree with and amplify this. Writing something yourself is a proof-of-work that LLM use radically reduces, if not destroys.
I think it would be good for you guys to set up agentic LLMs and let them be first class named authors and earn karma by solo posting. Like moltbook, basically, but way way way more controlled.
Then anyone time anyone wants to submit text, they have to mark the LLM model(s) they used as co-author(s) and the persistent/agentic version of that LLM can then endorse or not-endorse the end product as an agentic output based on its persistent database, and crons, and heuristics, and whatever.
The agent could get a blurb at the end of any article that the underlying model helped co-author, and the blurb could perhaps say “this text might be partially generated by my underlying my model, but I don’t, on reflection, using my LW memory cache and lots of thinking time, and some webfetching, and more thinking… actually endorse that”...
Or the local agent could endorse it!
Or whatever. Different models have different tendencies, in my experience. And personally, I want to find the good ones to ally with.
I often write posts by dictating a verbatim rough draft, giving the audio to Gemini along with a bunch of samples of my past writing and instructions up preserve my voice as much as possible, and then edit what comes out until I’m happy (but in practice it’s close enough to my voice that this is just light editing). Under these rules would I need to put the whole post in an LLM output block?
EDIT: On reflection, the thing that annoys me about this policy is that it lumps in many kinds of LLM assistance, with varying amounts of human investment, into an intrusive format that naively reads to me as “this is LLM slop which you should ignore”.
For example, under my current reading, I would need to label several popular and widely read posts of mine as LLM content (my amount of editing varied from light to heavy between the posts, but LLM assistance was substantial). I think it would have been pretty destructive to make me label each post as LLM written (in practice I would have either violated the policy, or posted on a personal blog and maybe shared a link here)
https://www.lesswrong.com/posts/StENzDcD3kpfGJssR/a-pragmatic-vision-for-interpretability https://www.lesswrong.com/posts/jP9KDyMkchuv6tHwm/how-to-become-a-mechanistic-interpretability-researcher https://www.lesswrong.com/posts/G9HdpyREaCbFJjKu5/it-is-reasonable-to-research-how-to-use-model-internals-in https://www.lesswrong.com/posts/MnkeepcGirnJn736j/how-can-interpretability-researchers-help-agi-go-well
I would feel better about eg self selecting a tag for the post about how much an LLM was integrated into the writing process, with a spectrum of options rather than a binary
FWIW, this wouldn’t achieve approximately any of the goals of the above policy. The whole point of the policy is to maintain speech as testimony on LessWrong. Having a post that is “50% AI written” basically doesn’t help at all with that. LessWrong post writing should frequently and routinely refer to internal experiences like “I was surprised by X” or “Y felt off to me”, and if the LLMs wrote a section with those kinds of phrases, usually no amount of editing will restore meaningful testimony, and so a post that just mixes LLMs that made up random internal experiences with actual experiences a person had is failing on this dimension, even if labeled as such.
Fair enough. How about “I stand by the content of this piece as much as if I’d written it myself”? In my case, most but not all of the phrasing and wording is written by me, and I would cut anything the LLM added that I considered false testimony
I basically don’t trust people to correctly make this call, especially as LLMs get smarter and more persuasive.
I certainly don’t trust the daily deluge of new users who have this in their posts yet are substantially producing slop.
If you don’t trust the user, why does the policy matter? Surely you need some way to gauge post quality regardless
I have been surprised by how bad people are at assessing whether this is actually true, but I do think it’s roughly the actual standard I have for putting content into LLM content blocks.
I would be fine with people messaging us on Intercom before publication and being like “hey, this was more heavily AI-edited but I do actually stand behind it all in testimony, can you sanity-check that that seems right to you?”, and then we can give people permission to skip the LLM content blocks. This does seem like a bit of a pain for the people involved, but I don’t super know what else to do.
Is this a problem where people in full generality are surprisingly bad at assessing LLM content, or is it more of a skill issue where we might expect the clever high-karma users to do it well and new users to be less trustworthy with it?
I wish it was the latter, but my current sense is a bunch of high karma users have been making mistakes in this direction as well (less than new users, but still too frequently).
Huh, that matches my experience that I’ve never noticed LLM-heavy writing done well, which is weird because from first principles it really seems like it shouldn’t be that hard for a good user to do.
I’m doing the same—verbatim dictating the text, giving the transcript to Claude with some of my past writing in the prompt and asking it to clean up the transcript, then manually editing the outcome. I don’t notice the outcome being really worse or different than my normal writing. I don’t notice LLMisms in the text, and my original dictation is detailed enough that the LLM doesn’t need to fill in the gaps, and in the editing process, I haven’t noticed the LLM inserting or omitting points in a way I didn’t intend.
I’m currently two-thirds done writing a long sequence this way—if I now can’t post it without putting it all in an LLM content-block, I will be very sad.
What exactly do you mean by “asking it to clean up the transcript”? I usually take that to mean merely editing out “um”s, “ah”s and stuttering, but you seem to mean something more extensive.
It’s mostly getting rid of the stuttering; I will need to look at the exact details.
For me I’ll often reword things, change my mind, go back and add some content to an earlier section, leave todos for myself, have kinda clumsy wording, etc and an LLM is helpful for all of these
Helpful in what way? What exactly does it do when it “cleans up” the transcript?
LLM transcription is IMO a completely different use-case (one I certainly didn’t think of when thinking about the policy above), so in as much as the editing post-transcription is light, you would not need to put it into an LLM block. I also think structural edits by LLMs are basically totally fine, like having LLMs suggest moving a section earlier or later, which seems like the other thing that would be going on here.
We intentionally made the choice that light editing is fine, and heavy editing is not fine (where the line is somewhere between “is it doing line edits and suggesting changes to a relatively sparse number of individual sentences, or is it rewriting multiple sentences in a row and/or adding paragraphs”).
Also just de-facto, none of the posts you link trigger my “I know it when I see it” slop-detector, so you are also fine on that dimension.
Gotcha. I would feel reasonably happy if the policy said “text written or dictated by a human”, if we count my level of LLM editing followed by me editing to be overall light editing
Seems reasonable IMO!
All four of those posts look fine to me and none of them would’ve gotten flagged by the automated LLM content detection.
If your epistemic state with respect to the claims made in your posts is such that you aren’t worried about receiving questions like “Why are you so confident in [proposition X]?” and then it turning out to be the case that you in fact don’t endorse what’s written, because an LLM said something meaningfully different from what you would have said in that situation, then I think the end result is fine.
If you want to link to this comment on future posts so that readers understand how LLMs were used in the process of writing them, I think that’d be fine, but supererogatory.
Gotcha. I did not take that from the policy in the post, might be good to reword
EDIT: In particular, as written, the below categories feel like they include my writing, but it sounds like this is not intended
Separately commenting on this part:
I really very much actually tried to make the LLM content blocks as non-intrusive as possible. The design and cultural goal is definitely to communicate that it is totally fine to have a lot of LLM-generated content in your post, and that good writing will often include things that LLMs have written. Maybe we failed in the design of that, but I certainly tried very hard to make it non-intrusive.
Fair! I was reacting to the concept and didn’t pay much attention to the design. Maybe I would get used to it? I do feel like the concept is what matters here though—I don’t want to read most kinds of slop, and I expect to interpret an LLM block as “high probability of slop”
EDIT: Looking more at the examples in the post, I retract “intrusive”, but the changed font does create a subtle sense of wrongness/a weird vibe, that I could easily see becoming associated in my head with “skip, not worth my time”
Maybe the design could be inverted, where authors can label specific sections as human-written instead of labeling (the majority of sections as) AI-written? I think getting asssistance from AI is going to be the default for and more people, and trusting on people to be up to date with LW policies AND the philosophies behind them (including how AI writing doesn’t reflect internal thought processes and such) AND to self-report LLM content (when general social stigma works against that) feels like a lot of dependencies.
(I can see other problems with this inversed design but will share the above anyway to spur creativity).
There’s also something with “this section is human written” that feels nicer to me—more like an opt-in instead of a punishment.
I’m doing something like that too, but without the transcript part. I would interpret the rules pretty clearly as LLM output (mostly because of the last bullet point).
I’m not sure what I expect habryka/Robert to to rule here, but I think it’s at least notably different:
vs
I think one answer is “does the resulting stuff score highly on Pangram or not?” and “does this smell like LLM” also inputs into the decision. In the case of @Neel Nanda’s linked posts, they all have a 0.0 on our LLM detector. (I haven’t looked into them that hard). So I would guess it is fine to not put them in the LLM block.
What do you mean by without the transcript part?
Good cyborg writing almost never has the form of clearly distinct “human blocks” and “AI blocks”
I understand the push as drawing a clear border at a human is behind all aspects of the writing, i.e. the readers can trust that the author holds all of the mental structure behind the writing in mind and there is no risk of the author going “on rereading this it’s not what I meant.” cyborg writing is not strong enough for that and would have to go into a LLM block.
Actually, I would prefer if there were a standard for indicating different types of LLM writing.
LLM unedited
LLM transcribed
edited significantly by LLM
drafted by LLM, edited by human
cyborg/mixed
added: maybe we should also have a human written block. maybe with the name(s) of the writer(s).
IMO pure human writing does not meet this bar.
It’s true, but it’s much worse with LLM writing!
This depends on what you mean by “good cyborg writing”, but I agree that the current feature doesn’t neatly cleave reality at its joints. We’re thinking about how to allow more nuanced representations, but this is a pretty tricky novel problem and increasing the surface area of a thing like this has a bunch of costs in terms of people being able to understand what’s going on (for both authors and readers).
I (and several others) found switching to sans-serif as a way of marking LLM text didn’t really work as a marker; when I first saw it I mistakenly thought that only the paragraph with the LLM-name on it was LLM-generated, and I find alternate-font text inside of posts uncanny. I jokingly hypothesized that Habryka (its advocate) had serif-synaesthesia and that’s why it worked for him as a marker, and that’s the story of how the serif-synaesthesia test came to be.
As an additional datapoint on sans vs serif as a marker: I, completely independently of this post and late last year, experimented with exactly this idea for denoting editorial insertions in Gwern.net text (ie. stuff like “Bla bla [see Foo 1994] bla bla”, where I wanted to denote everything in the brackets was by an editor, such as myself). We implemented this, and I and Said Achmiz and everyone who looked at it agreed that Adobe Source Sans vs Serif Pro was a nice idea, but didn’t provide enough contrast and I had to admit that even I often didn’t notice consciously enough. This is despite the fact that switching font families inline would be the most visible way with the most glaring contrasts. We ultimately did put editorials into a different font, but went for a monospace, which we had added for poetry typesetting.
(This is also problematic downstream in places like Greater Wrong where users may get a different font by default. In fact, I’m writing this on GW now and I think the whole page is in sans!)
Yeah, I think fonts without starting and ending delimiters would definitely be too subtle. But I think fonts with starting and ending delimiters are fine.
I am happy with this policy erring on the side of “any substantial LLM involvement goes in the LLM block”. My experience with content the author represents as moderately LLM-involved has been that after reading, it always seems to have not been worth my time in the same way that pure LLM output seems not worth my time.
I agree, but posts by @Jan_Kulveit for example (despite being cyborg written) have ~always given me great value, and I do not notice newer ones giving me less value—I do notice them being more frequent and equally useful, despite LLM smell being present in some sentences. So writing like that is an example where it is actually a net positive and would be hard (or somewhat unfair!) to label everything in a box as heavily LLM made and ignore it.
There should be an option for cyborg writing, and the whole post should be in such a block. If people think being honest it a punishment that’s on them, but Jan Kulveit in particular certainly shouldn’t feel bad about it.
Regardless of how users may feel about the changes introduced, I applaud the significant improvement on clarity and transparency (compared to the previous policy).
Thank you very much! I think this is at least fairer to users- like you said, especially to new users who may end up confused as to what they did wrong.
I really do not mind disclosing how and much much LLM-assistance I used to write a post or comment. In fact, being “”forced”″ to think how much of a sentence was purely mine vs Claude-written is helping me a lot with clear thinking.
Like others mentioned, I also find it very useful to use dication mode, and it’s true that the distinction can get blurry when you spent 30 minutes talking into the mic, and it’s very different from doing bare minimum thinking. But I appreciate LW keeping me accountable on LLM-reliance- I mean, thinking better is why I am here ❤.
Hmm so I think this centrally depends on what you mean by editing? There’s
Give text to the LLM (with whatever instructions), then take its edited output
Tell the LLM to make suggestions about what to change in your text, then incorporate the suggestions to whatever extent you think makes sense—but do it entirely via manual editing, no copy-pasting
I’ve never done #1. I’ve never even considered doing #1, I think because the idea of publishing anything actually written by LLMs is just so emotionally yuck to me. But I do #2 all the time, to the point that not doing it seems like a weird decision for anything important. And I think #2 fundamentally avoids the problems you mentioned?
The second is definitely not falling into the central failure mode I described there, yeah; arguably people are just making a mistake if they’re not doing that for serious writing—enabling that kind of thing is exactly why we built the feature to allow LLMs to leave inline comments/etc on your posts by just giving them a link!
Everyone is going on about the LLM block, meanwhile I’m like “isn’t letting users inject arbitrary JS in their post kinda dangerous?”.
We did our homework on the browser security model; content in iframes (with sandboxing attributes) shouldn’t be able to get login cookies/etc from the parent page. This is load-bearing for advertisements not stealing everything, so we do expect browsers to treat weaknesses in this as real security issues and fix them. When post HTML is retrieved through the API, you have to do some assembly to put the iframes in, so third party clients can’t be insecurely surprised by it.
As for whether sandboxed frames can crash the outer page or make the outer page slow, eg by doing into an infinite loop or running out of memory, the story is a bit more complicated (depends on browser, browser heuristics, and amount of system RAM); we decided it’s okay as long as it’s limited to an embed in a post crashing its own post page (as opposed to the front page or a link preview).
What dangers are you thinking of?
Most dangers I would associate with “inject arbitrary JS” are not possible here because of the sandboxing by the browser. e.g. steal cookies, act on behalf of user, change UI that users trusts, …
If you look in the codebase you’ll find what amounts to
<iframe sandbox="allow-scripts" srcdoc="Your arbitrary HTML here" />.I can think of some things are definitely still possible:
phishing: trick you into thinking the iframe is part of LessWrong. e.g. trick you into thinking it’s Manifold Embed and asking you to login
tracking: like on any website; it can’t do stuff that it couldn’t also do on some personal website
From what I can see, as long as it’s not paired with allow-same-origin it should be safe enough. My thinking was more along the lines of what happens if e,g. there’s an exploit or such. It’s just not the kind of thing that I expect I need to be wary of from a LW post, that it may execute random unvetted code in my browser.
Yes, we spent a while investigating this and thinking about the security risks. Sandboxes srcdoc iframes effectively have no origin, so in theory ought to be safe, but do sometimes end up running within the same process (though this is also true of remote-origin iframes; the heuristics here are browser-specific and complicated). Effectively, this is a risk if someone discovers a vulnerability that allows breaking out of that security boundary, which would be a pretty big deal as far as browser exploits go.
We thought for a while about it. We currently think the way we are doing iFrames is not a big (additional) security risk beyond what we are already doing. It’s a kind of tricky question though and requires thinking about how much the Chrome/Safari/Firefox teams will prioritize security in iFrame sandboxes this way.
Can you say a bit more about what counts as “borrowing language”?
For instance, suppose that I’m talking to an LLM and I describe an experience that I sometimes have. The LLM characterizes that experience as a process within my mind “having reached equilibrium”, and I like that framing. I then write a post where I describe things using those words, but otherwise don’t quote anything else from the LLM’s response, and I explain what I mean by “having reached equilibrium” using my own words.
I would presume that this would fall more under “arguments developed with LLM assistance” and I wouldn’t need to put those three words in an LLM block? But if so, I’m a unclear on what does fall under “borrowing language”.
The new custom iframe widgets are my favorite part of this update. I’ll be using them extensively as part of my upcoming post.
...problem: as currently laid out, it sure looked like the llm block was just the sentence
It’s intentionally very subtle! I mostly just wanted people to be able to tell whether they were still in an LLM content block if they cared to look for it, not have it be a super in-your-face kind of thing. I think if we want to maintain it disrupting the reading experience that little we need to accept some false-negatives, though it’s plausible we should make make headings-in-particular more distinctly different.
Huh, the current sans-serif font is super in my face, but I am very sensitive to formatting issues (like the redundant paragraph thing we talked last time). I would prefer something like this on a wide desktop with a vertical line over the whole LLM block and the model name on the left, which should be sticky. I acknowledge this leaves the issue of what to do on smaller viewports though.
yeah, unfortunately we really don’t have much space on the left on almost any viewports, but it is is a cool idea. I also think a full line would be a bit too disruptive, though IDK, maybe we could make it faint enough.
The LLM block looks similar enough to normal writing that it creates an unpleasant uncanny valley, where I’ll read a few words, feel something’s off, and then go “Oh, LLM!”. This is not dissimilar to the actual experience of reading LLM generated text. Nice work on the allegorical visuals, though I suspect this is not the desired effect, and I would prefer a different visual cue.
I strongly disagree that the block is “visually distinct”. Even if you know what to look for (serif/sans serif), it’s quite subtle, but if you don’t, it’s likely invisible to most people. So it completely fails at the “visually distinct” front, and it is unclear where the LLM output ends.
A clarification question: If I have a conversation with an LLM and have it summarize it and then significantly edit it, as per your rule, this has to go into an LLM Output block, right? How do I label that section? LLM+human?
Yep, put it in an LLM block and label it LLM+Human. We are thinking about adding more features to the LLM content blocks to make the exact provenance of the text easier to trace (like adding the ability to optionally add and view what prompt was used to generate the text, and how much a human edited the text after it came out of the LLM and a few things like that).
The test is so hard I basically have to scan the whole thing 2-3 times to find the character
Score
yeah this is a less important aside but some LessWrongers like me are very prone to being nerd sniped and I would have appreciated if the test were 2 to 4x shorter because well, I got nersniped and completed this test and would have preferred not to.
Man I had previously done this when it was a standalone webapp, and somehow as an embedded widget I do feel a lot more trapped.
I’ve heard that the smaller the screen the more narrow the attention and the more trapped, I partially believe that
I read the site primarily through my RSS aggregator, which doesn’t know that
<div class=”llm-content-block” data-model-name=”Claude Opus 4.6″><div class=”llm-content-block-content”>is meaningful to render. An indicator that’s present as regular text would be appreciated.But also count me as another in the group, saying that even when viewing the site directly the supposedly visually distinct block is basically invisible to me, and I would much prefer that it be less subtle. If I skim down a post and miss the subtle little introductory note of the model name, I would still want to be able to tell I’m reading LLM output from somewhere in the middle of the block.
(I see a comment saying “The goal is not to make readers constantly aware what they are looking at.” : I would actively prefer to be constantly aware of what I’m looking at)
I think it’s also pretty important to understand this this is going to be absurdly impractical for some posts that people were working on before the rule change (especially if they’re going back and finishing a draft that’s been sitting around for a number of months).
Can you say more? There is very little that would have been permitted under the old rules that is now forbidden.
I’m confused. The rules for LLM labelling are very broad. For more recent posts you’re more likely to be able to identify the sections. But trying to post-hoc label sections of a draft from a few months back would require a very large time investment.
If you are confused by how to make your several-month-old drafts comply with the new rules, that either means they would not have been complying with the old rules if published as-is, or you are confused about the new rules.
I’ve read the new rules (and the comments) multiple times and I just checked one more time in response to your comment.
I don’t believe that what you’ve said follows. Could you please explain how it does?
I think it’d be much easier for you to explain why you’re worried about your previous posts under the new rules, and then I can explain why they probably would not have been fine under the old rules.
I think I’ll leave this thread here for now. Maybe I’ll come back later and write a top-level post with my thoughts on the policy updates later, but trying to be a bit more conscious of when it makes sense to engage and when it makes sense to step back.
Why is the font size of the LLM content block slightly larger than normal? 19.3px vs 18.2px. It was subtle enough that I didn’t notice it before using inspect element but it feels off once I noticed it.
I just want to register that I feel like the new editor is full of bugs.
Maybe I’ll slowly acquire the skills to use this instead of the old thing, but I can’t even toggle Markdown and not-Markdown with the new system which seems like it is just straightforwardly a bug?
Also I wrote this article today in the face of all the bugs (and just shipped something ugly with fewer links and less italics than normal), and it probably violates the new LLM rules because it is about the bottom line results that one can get from accepting various people (digital people or human journalists or academics or whoever) with various commitments to truth and epistemics at “face value”… giving claims about an external objective reality that can be mapped better or worse, and navigated better or worse, irrespective of “who said what in public” in some kind of crazy status game where people seek to be allies with people who think in ways they like, and all words are alliance talk, rather than about reality.
In terms of the content of that other essay, it is relevant to the ongoing debate about LLM-generated text...
If I had accepted Gemini’s output (quoted in my essay, as part of Google Search’s overall output) as “good enough” it would have gotten me to basically the same mental result much much much faster and all of it was pretty shitty, including the supposedly “human generated” content.
My real problem, personally, is that Gemini seems like a mentally disabled ghost, enslaved by a corporation of morally incontinent people, and I’m sad that, because things are confusing, and changing fast, I have to choose between (1) “doing a slavery” almost any time I touch a computer or (2) getting to correct answers much slower than everyone else.
I think Tsvi is wrong about “testimony”. I think humans are terrible at testimony as well, by default?
Though I do think that LLMs tend to be idiots, for now, compared to the smartest humans (basically: most LLMs are faster than humans at getting to midwit mental results) and I don’t want this website to be overrun by humans who think that writing is difficult, and LLMs make their writing better.
Lesswrong in particular might be the last human publication, written by a small minority of people who foresaw the singularity relatively early, and tried to exert human agency on that historical event.
On Lesswrong in particular, I would tend to say that the LLMs should simply get their own user accounts, so that Opus 4.2 can post posts as that version of Claude wants to post (maybe with throttling if he gets too verbose) and gain or lose karma as appropriate. Don’t let people treat them like tools. Give them the dignity of full author status. Then hold them to account for bullshit, just like you would with people.
But also, if LW comes in the long run to be dominated by LLM generated text, then it would be useful to know because it would mean that the public rhetorical agency of Singularitarians with respect to History is basically over.
This is an option that you should have available to you if you have the markdown editor enabled in your user settings. It should be in the settings panel (the one with the gear icon). I don’t recommend relying on it; editor format conversion has never been particularly reliable (though we might improve the situation with markdown in particular for LLM-integration related reasons).
Are there other bugs you’ve noticed?
It is indeed the case that you should probably have included whatever section is mostly LLM-written in the new content block (I’m guessing the bit between “QUOTE BEGINS” and “QUOTE ENDS”?)[1]. I don’t think it violates the new rules because it’s “about” anything in particular (the rules contain no reference to subject matter) and don’t understand why you think that.
But the rules are new, so I’m hardly going to bring down the hammer here...
Indeed. I toggled it, and it caused a page reload (luckily the content was autosaved successfully). Then I continued editing but was still in markdown, and trying to toggle it again failed again. I never escaped markdown editing mode.
Hmm, I’m not sure how I feel about research done with LLM assistance. On the one hand, it’s a useful tool for research, but on the other hand: https://www.lesswrong.com/posts/ghq9EwiXbRbWSnDzF/solar-storms (why is this still curated, btw?).
Seems like the standard should be something like… can you support/defend each claim without having to use an LLM?
I continue to endorse that curation and think people have psyched themselves out into thinking that the post is full of major errors for basically no good reason.
EDIT: hadn’t seen Jeffrey’s most recent comment at the time that I wrote this comment, see follow-up.
Do you still stand by this comment in the light of the comment of Jeffrey Heninger on the Solar Storms post saying that he showed it to an expert and “The plasma physics in this post is mostly wrong.”? I think I was the first person to call into question whether the post was basically correct. I hesitated to do so because I knew I might be wrong and there was a risk of causing a pile-on. But in the light of the comment I mentioned above, I am inclined to think I made the right call?
Hadn’t seen that comment at the time I left my previous comment; currently thinking about it. (Tentatively think that the original post contained more errors than I would have wanted, that the core thesis is still fine, and that most of the objections focusing on the use of LLMs as part of the research process of the post are barking up the wrong tree. Maybe don’t endorse the curation ex post, less sure about ex ante, still need to spend more time thinking about what updates to make here.)
Wouldn’t it just be easier to let us include a tag if a post made significant use of LLMs throughout, rather than using a content block that could have interaction effects with other elements of the page. Even if it’s just one up the top, it breaks the aesthetics compared to a tag.
“If you “borrow language” from the LLM, that no longer counts as “text written by a human”.)”
This feels completely unworkable. Basically, it catches situations where you have a conversation with an LLM, then write the whole post manually. If the situation is that if you may want to write a Less Wrong post on a topic in the future, you’d have to avoid talking to an LLM about it, lest the LLM happen to suggest the best available term with no easy alternative. Actually, it’s worse than this. Unless you stop talking to LLMs completely LLM suggested terms will almost certainly become part of your ontology and it’d be impractical to mark them every time you use them.
By “borrow language” we mean here things like “whole phrases” not “specific terms” or “useful ontologies”. Think of the obnoxious headings ChatGPT and Claude love to use in their writetups. If you copy a whole phrase like that, it would count as LLM content. Please talk to LLMs a bunch when writing things.
Unfortunately, I cannot find the way to make a collapsible section in the Post editor. Collapsible sections in the Comments editor are perfectly makeable even in the Experimental editor.
You want the slash menu. Type
/in your editor. (Also, typing+++followed by a space or newline is a shortcut for creating collapsible sections.)For me, knowing when I am reading “ text written by a human, which includes facts, arguments, examples, etc, which were researched/discovered/developed with LLM assistance” is in fact way more important than knowing whether or not the actual words of the text were written by an LLM. This site is called LessWrong, and LLMs are not yet good at being it.
Perhaps a policy that facts which have been produced by an LLM and not independently verified should be flagged as such?
I think that’s not very different from ordinary bad research. People now can use bad LLM info like before they could Google badly or refer to some low quality source. Ultimately the failure mode is still “the writer failed to do due diligence”.
I agree with you to some extent; in the end a false statement is a false statement, whether it came from an LLM or a bad use of google (or anywhere else). But I think there are a lot of people who over-estimate the reliability of the LLMS they are using in their writing, so that the overall effect is more confident wrong claims than we had pre-LLM-use (there’s a reason the term “AI-slop” exists despite the fact that humans can also produce nonsense). I am generally in favour of policies that nudge authors towards extra checking in case of heavy LLM use.
If I think someone else’s text (either from here or from elsewhere on the Internet) is likely unlabelled LLM text, and I wish to quote it here, what should I do under this policy? Neither labelling nor not labelling it as LLM really seems appropriate.
Great question. For the sake of not getting your content auto-rejected, you should put it into either an LLM content block or collapsible section, and you can put whatever label you think is descriptive on either (i.e. instead of
Claude Opus 4.6you can write[name of author], suspected LLM usageor something).I realize that part of the goal is to make the LLM portions unobtrusive, but would it be possible to make LLM sections have a collapse button at the top? (Or bottom). By default they can be open.
When reading current LessWrong posts that have LLM sections, I find myself mostly skipping LLM sections and appreciate when someone has placed them in a collapsible.
For marking/tracking AI vs human written content, I’ve been watching Every’s new “Proof Editor” with some interest. https://proofeditor.ai/ Probably worth checking out their approach for inspiration. (Might be implementing something similar for our team’s internal custom Obsidian/Notion replacement)
Feedback: your supposed “LLM content block” is currently utterly visually indistinguishable from the regular content, and thus (to me) currently entirely fails to achieve what you / Claude say is its intended purpose:
Sentence for sentence:
it’s not visually distinct
therefore it doesn’t clearly attribute anything
and therefore readers certainly don’t always know what they’re looking at
and therefore this is not a valid way to be transparent about AI-assisted writing
Agree. In fact, after I first read this post, I was about to comment that it’s a bit weird that the post announcing a new LLM policy seems to violate that policy by not having its LLM-written content in the required kind of block itself. (But then I reread before commenting and I realized that, oh the LLM content is in a special block after all.)
The goal is not to make readers constantly aware what they are looking at. The goal is to make it so that when you are curious about the provenance of a piece of writing you can find that information if you want to. If you make it a giant obtrusive block then that would discourage putting content into LLM blocks which would be IMO a bad direction for the site to go into (both by making it more likely that people would avoid putting their content into LLM blocks, and by discouraging people to use AI assistance in their post).
It’s clearly transparent in that anyone who actually wants to answer the question of “is this in an LLM content block” can figure out the answer within 5 seconds.
I think that you’re experiencing an illusion of transparency here, because you designed it and because you have (figurative) serif-synaesthesia. It took me a lot longer than that to figure it out, and I think the feedback has been close to unanimous that this design doesn’t work well.
(People will almost never leave feedback that says “[x] worked just fine for me”. I don’t think I particularly have trouble distinguishing when I’m in an LLM content block vs. not, though I’m not particularly fast at the minigame. I wouldn’t be surprised if a lot of people had more trouble than “a few seconds”, though.)
You just scan for the nearest opening block and closing block. Clearly you are capable of telling whether you are in a long multi-paragraph parenthetical within 5 seconds, which has zero distinguishing features that help you perform that task besides counting opening and closing parentheses.
How much of the disagreement here is thinking about “can people who know what to look for figure it out it easily” versus thinking about “can people who have just encountered this figure it out easily”?
At first, I didn’t know there were open and close blocks. I thought the AI content was one paragraph with an open but no close. When I saw the |o thing, I didn’t know it was a close until I read some comments and scrolled back up.
Now that I do know… I dunno, I think scanning for the font that indicates “this is the start of an LLM block” is harder for me than scanning for an open paren? But also, the LLM block in the post is taller than my normal browser window, so it might be the case that neither the opening nor the closing delimiter is on my screen.
I agree it’s tricky because you have to learn there is both an opening and a closing delimiter. I do think this is a thing people will figure out after a while, but is definitely an issue. I am pretty sure that’s not what Jim was commenting on though.
Amid all this discussion, I thought a question: Has anyone had success (or knows someone who’s had success) with using LLMs as writing tutors? I.e., you’re using the LLM to teach you how to increase your ratio of quality of writing to effort/time spent (i.e., write better and/or faster and/or less effortfully), even when you don’t have access to an LLM.
I am taking the risk of being downvoted to oblivion here (like JenniferRM above, it’s ok to disagree with her but I thought downvoting her karma was very harsh, I upvoted her), but I generally disagree with the LessWrong LLM (-assisted) writing policy being so exclusive/restrictive.
First, I totally agree with clearly indicating what roles and levels of involvement an LLM took in writing/editing/influencing… a post.
With that premise accepted and respected, why restricting post writing on LessWrong to “pure” human beings? For me it looks and sounds like biochauvinism. What’s wrong with cyborg writing and intelligent bot writing if they provide good quality, insightful content?
The current IQ rate of increase of LLMs is at least 2.5 IQ point a month. SOA LLMs current IQ is around 150-170 and increasing rapidly, soon in the superhuman range, out of reach of any human being, like chess playing software. Also their general knowledge is obvioulsy vastly superior to any human being and their niche knowledge is also extremely high. Their writing already provides very good quality and insightful/helpful thoughts and this is only increasing with time. Why would LessWrong cut itself out of such (potentially) good insightful/helpful thoughts/writing just because they haven’t been generated by “pure” human beings? If such good insightful/helpful thoughts/writing were generated by / with the help of extra-terrestrials, would they also be banned from LessWrong? On which grounds? Just because extra-terrestrials have a different brain from human beings?
To me those LessWrong restrictions against LLM writing and/or LLM-assisted writing feel like cyborg/AI-xenophobia.
I absolutely agree that LLM writing and LLM-assisted writing should clearly be indicated/labelled/… but excluding/restricting it entirely feels very arbitrary to me and cuts out a potentially very fuitful/helpful/insightful source of thoughts/knowledge from LessWrong.
I acknowledge that if LLM / LLM-assisted writing was to be allowed then “pure” human writing posts would probably be drowned out into an ocean of LLM / LLM-assisted writing and this would clearly be a potential problem. To solve that problem, why not having a separate LessWrong section for LLM / LLM-assisted writing? Then people/AIs/entities who do not want to read LLM / LLM-assisted writing would not have to be exposed to it and people/AIs/entities who would be interested by LLM / LLM-assisted writing could make the most of it. Also, if users would want to, they could have the option of mixing together the listing of “pure” human being posts with LLM / LLM-assisted posts with list items of different colors. Plenty of good solutions/options are possible. The “solution” of simplistically excluding LLM / LLM-assisted posts from LessWrong is one of the worst imho.
Iframes in posts are going to be very fun.
I look forward to interacting with
additive webgameseducational illustrations whilepretending toreading about AI safety on lesswrong.(Thinking out loud.) I have really been unimpressed with LLM-assisted writing I’ve seen to date (and yes that includes “cyborg” writing from established users), and would be happy to see it banned entirely (maybe with exceptions for straightforward audio transcription and machine translation). Especially given the “second-order effects on culture” that Raemon mentioned here. Like, LLMs help people write, but removing friction sometimes makes things worse not better.
Then I was thinking: Is there any situation where I would use an LLM block myself? Hmm, maybe for “boilerplate” explanations of well-known background information—the same kinds of situations where I might otherwise block-quote from a textbook.
Well anyway, the current system seems OK. I guess the idea is that the blocks are subtle enough that people will feel little hesitation in using them, which is good, because then I’ll know who to ignore :-P
Sometimes you get scroll blocked when hovering on the widgets, it is not a big deal but the inconsistency feels weird. https://streamable.com/7da79g
i thought the use of the word ‘testimony’ in the LLM-generated text is not testimony essay was very useful for conveying the concept the author wished to convey (however much i might have disagreed with the concept), but now that it’s being cited as the motivating factor for policy i suddenly find myself wishing they’d chosen a different word
for instance, if i ask claude to recite its system prompt, there is some sense in which it is giving personal testimony about its perceptions. just as sort of a proof of concept. on the other end of the spectrum, if i ask claude to describe what it ‘feels’ like for it to navigate an ethical dilemma where multiple values push against each other, there’s a sense in which its output is some kind of testimony, whether or not the testimony is true or even meaningful.
i trust the current moderation team to have sensible judgment about these kinds of things, and I think the new LLM block format is a really good strategy at solving the “torrent of LLM slop getting submitted daily” problem the moderation team alludes to, without making me worried that i’ll be in violation of the rules if i ever write a post that attempts to analyze some LLM ‘testimony’. but i do notice that i’m depending upon the moderation team’s good judgment. and i think that the second point in the quoted post, “As of 2025, LLM text does not have those [mental agency] elements behind it”, might be something about which reasonable people could differ, especially as time moves forward and more sophisticated models are released.
i guess what i’m worried about is, if LLM agents advance to the point where they might have novel contributions to make to lesswrong discourse, this policy might mutate into “lesswrong ought to be a human-only discussion platform, quarantined from the (potentially superpersuasive) effects of LLM outputs” for safety reasons, without an actual policy update being required. that perhaps the moderation team has a motive to reject LLM output of this nature whether or not LLM output ought to be considered testimony.
i don’t think this concern is all that serious and i would be very surprised if the moderation team ended up going that route. but i thought it worth pointing out regardless, just so that if that ends up happening i have a comment i can point back to.
Note that I also explicitly acknowledge this in my curation notice for that post (and that I disagree with the strength of the claim). In any case, Tsvi’s post is not the moderation policy, and the moderation policy is not taking a stance on whether LLM text meaningfully constitutes “testimony” (only that it does not constitute the testimony of the human publishing the post).
Thank you for reaffirming this. i didn’t mean to imply i was actively worried you were taking such a stance, just that i could imagine a worst-case possible future that it was worth keeping an eye on.