Running Lightcone Infrastructure, which runs LessWrong and Lighthaven.space. You can reach me at habryka@lesswrong.com.
(I have signed no contracts or agreements whose existence I cannot mention, which I am mentioning here as a canary)
Running Lightcone Infrastructure, which runs LessWrong and Lighthaven.space. You can reach me at habryka@lesswrong.com.
(I have signed no contracts or agreements whose existence I cannot mention, which I am mentioning here as a canary)
Yeah, I think fonts without starting and ending delimiters would definitely be too subtle. But I think fonts with starting and ending delimiters are fine.
I agree it’s tricky because you have to learn there is both an opening and a closing delimiter. I do think this is a thing people will figure out after a while, but is definitely an issue. I am pretty sure that’s not what Jim was commenting on though.
I wish it was the latter, but my current sense is a bunch of high karma users have been making mistakes in this direction as well (less than new users, but still too frequently).
Ah yeah, we should sync up the spacing between editor and published content here.
You just scan for the nearest opening block and closing block. Clearly you are capable of telling whether you are in a long multi-paragraph parenthetical within 5 seconds, which has zero distinguishing features that help you perform that task besides counting opening and closing parentheses.
The goal is not to make readers constantly aware what they are looking at. The goal is to make it so that when you are curious about the provenance of a piece of writing you can find that information if you want to. If you make it a giant obtrusive block then that would discourage putting content into LLM blocks which would be IMO a bad direction for the site to go into (both by making it more likely that people would avoid putting their content into LLM blocks, and by discouraging people to use AI assistance in their post).
It’s clearly transparent in that anyone who actually wants to answer the question of “is this in an LLM content block” can figure out the answer within 5 seconds.
yeah, unfortunately we really don’t have much space on the left on almost any viewports, but it is is a cool idea. I also think a full line would be a bit too disruptive, though IDK, maybe we could make it faint enough.
It’s intentionally very subtle! I mostly just wanted people to be able to tell whether they were still in an LLM content block if they cared to look for it, not have it be a super in-your-face kind of thing. I think if we want to maintain it disrupting the reading experience that little we need to accept some false-negatives, though it’s plausible we should make make headings-in-particular more distinctly different.
Seems reasonable IMO!
I have been surprised by how bad people are at assessing whether this is actually true, but I do think it’s roughly the actual standard I have for putting content into LLM content blocks.
I would be fine with people messaging us on Intercom before publication and being like “hey, this was more heavily AI-edited but I do actually stand behind it all in testimony, can you sanity-check that that seems right to you?”, and then we can give people permission to skip the LLM content blocks. This does seem like a bit of a pain for the people involved, but I don’t super know what else to do.
I would feel better about eg self selecting a tag for the post about how much an LLM was integrated into the writing process, with a spectrum of options rather than a binary
FWIW, this wouldn’t achieve approximately any of the goals of the above policy. The whole point of the policy is to maintain speech as testimony on LessWrong. Having a post that is “50% AI written” basically doesn’t help at all with that. LessWrong post writing should frequently and routinely refer to internal experiences like “I was surprised by X” or “Y felt off to me”, and if the LLMs wrote a section with those kinds of phrases, usually no amount of editing will restore meaningful testimony, and so a post that just mixes LLMs that made up random internal experiences with actual experiences a person had is failing on this dimension, even if labeled as such.
Separately commenting on this part:
On reflection, the thing that annoys me about this policy is that it lumps in many kinds of LLM assistance, with varying amounts of human investment, into an intrusive format that naively reads to me as “this is LLM slop which you should ignore”.
I really very much actually tried to make the LLM content blocks as non-intrusive as possible. The design and cultural goal is definitely to communicate that it is totally fine to have a lot of LLM-generated content in your post, and that good writing will often include things that LLMs have written. Maybe we failed in the design of that, but I certainly tried very hard to make it non-intrusive.
Yep, put it in an LLM block and label it LLM+Human. We are thinking about adding more features to the LLM content blocks to make the exact provenance of the text easier to trace (like adding the ability to optionally add and view what prompt was used to generate the text, and how much a human edited the text after it came out of the LLM and a few things like that).
By “borrow language” we mean here things like “whole phrases” not “specific terms” or “useful ontologies”. Think of the obnoxious headings ChatGPT and Claude love to use in their writetups. If you copy a whole phrase like that, it would count as LLM content. Please talk to LLMs a bunch when writing things.
We thought for a while about it. We currently think the way we are doing iFrames is not a big (additional) security risk beyond what we are already doing. It’s a kind of tricky question though and requires thinking about how much the Chrome/Safari/Firefox teams will prioritize security in iFrame sandboxes this way.
It’s true, but it’s much worse with LLM writing!
LLM transcription is IMO a completely different use-case (one I certainly didn’t think of when thinking about the policy above), so in as much as the editing post-transcription is light, you would not need to put it into an LLM block. I also think structural edits by LLMs are basically totally fine, like having LLMs suggest moving a section earlier or later, which seems like the other thing that would be going on here.
We intentionally made the choice that light editing is fine, and heavy editing is not fine (where the line is somewhere between “is it doing line edits and suggesting changes to a relatively sparse number of individual sentences, or is it rewriting multiple sentences in a row and/or adding paragraphs”).
Also just de-facto, none of the posts you link trigger my “I know it when I see it” slop-detector, so you are also fine on that dimension.
Yeah, it’s on my list to add this back in, just haven’t gotten around to it. I want to make a few changes to the post-item design on the profile page in-general.
To help mitigate this issue, early, voluntary if-then commitments can contain “escape clauses” along the lines of: “We may cease adhering to these commitments if some actor who is not adhering to them is close to building more capable models than ours.” (Some more detailed suggested language for such a commitment is provided by METR, a nonprofit that works on AI evaluations.)
Just for reference, this framing is what makes me feel fine about things you said on this topic, but not fine about conversations I’ve had with Anthropic employees about this topic in the last few years. My conversations with Anthropic employees did definitely not involve them saying “we are committing to our RSP only if every other company also adopts a similar RSP”.
At the most they were saying “we are going to revise our RSP as we learn more about what an effective RSP would look like and might make changes in-accordance with that”, which is of course drastically different. If the commitment all along had been to “commit to the RSP conditional on other people also committing to equivalent policies”, then the RSP could have said that directly, and the change from an unconditional to a conditional policy is of course massive (and I think the RSP as written clearly was communicating itself as an unconditional policy).
We apply higher standards to posts by new users (in-particular we say in the onboarding docs and the new post page that your first post to LessWrong is kind of like a job application). This is just saying that a post co-written by AI will likely cause you to not meet that bar (though it’s not guaranteed!).