I often write posts by dictating a verbatim rough draft, giving the audio to Gemini along with a bunch of samples of my past writing and instructions up preserve my voice as much as possible, and then edit what comes out until I’m happy (but in practice it’s close enough to my voice that this is just light editing). Under these rules would I need to put the whole post in an LLM output block?
EDIT: On reflection, the thing that annoys me about this policy is that it lumps in many kinds of LLM assistance, with varying amounts of human investment, into an intrusive format that naively reads to me as “this is LLM slop which you should ignore”.
For example, under my current reading, I would need to label several popular and widely read posts of mine as LLM content (my amount of editing varied from light to heavy between the posts, but LLM assistance was substantial). I think it would have been pretty destructive to make me label each post as LLM written (in practice I would have either violated the policy, or posted on a personal blog and maybe shared a link here)
I would feel better about eg self selecting a tag for the post about how much an LLM was integrated into the writing process, with a spectrum of options rather than a binary
I would feel better about eg self selecting a tag for the post about how much an LLM was integrated into the writing process, with a spectrum of options rather than a binary
FWIW, this wouldn’t achieve approximately any of the goals of the above policy. The whole point of the policy is to maintain speech as testimony on LessWrong. Having a post that is “50% AI written” basically doesn’t help at all with that. LessWrong post writing should frequently and routinely refer to internal experiences like “I was surprised by X” or “Y felt off to me”, and if the LLMs wrote a section with those kinds of phrases, usually no amount of editing will restore meaningful testimony, and so a post that just mixes LLMs that made up random internal experiences with actual experiences a person had is failing on this dimension, even if labeled as such.
Fair enough. How about “I stand by the content of this piece as much as if I’d written it myself”? In my case, most but not all of the phrasing and wording is written by me, and I would cut anything the LLM added that I considered false testimony
I have been surprised by how bad people are at assessing whether this is actually true, but I do think it’s roughly the actual standard I have for putting content into LLM content blocks.
I would be fine with people messaging us on Intercom before publication and being like “hey, this was more heavily AI-edited but I do actually stand behind it all in testimony, can you sanity-check that that seems right to you?”, and then we can give people permission to skip the LLM content blocks. This does seem like a bit of a pain for the people involved, but I don’t super know what else to do.
Is this a problem where people in full generality are surprisingly bad at assessing LLM content, or is it more of a skill issue where we might expect the clever high-karma users to do it well and new users to be less trustworthy with it?
I wish it was the latter, but my current sense is a bunch of high karma users have been making mistakes in this direction as well (less than new users, but still too frequently).
Huh, that matches my experience that I’ve never noticed LLM-heavy writing done well, which is weird because from first principles it really seems like it shouldn’t be that hard for a good user to do.
I’m doing the same—verbatim dictating the text, giving the transcript to Claude with some of my past writing in the prompt and asking it to clean up the transcript, then manually editing the outcome. I don’t notice the outcome being really worse or different than my normal writing. I don’t notice LLMisms in the text, and my original dictation is detailed enough that the LLM doesn’t need to fill in the gaps, and in the editing process, I haven’t noticed the LLM inserting or omitting points in a way I didn’t intend.
I’m currently two-thirds done writing a long sequence this way—if I now can’t post it without putting it all in an LLM content-block, I will be very sad.
Update: Re-reading the cleaned-up transcripts, I’ve found them basically useless, and now I’m rewriting everything by hand. I think this is largely not Claude’s fault—I’m trying to explain complicated concepts in my posts, and my dictation was just not detailed enough to get everything across.
In any case, I wanted to write this update not to keep up this false data-point here.
That is quite an update. I am curious about how this happened: if you had done only 1 or 2 transcripts, I could understand eventually discovering enough severe subtle flaws as to render them completely worthless compared to starting over from scratch, but how did you get “two-thirds done writing a long sequence” while still being highly enthusiastic about stuff you were going to ultimately throw out as “basically useless” and you are now “rewriting everything by hand”? “Just not detailed enough” sounds like a strange explanation to me, because shouldn’t’ve you noticed that problem with the first transcript you looked at?
Part of the answer is that I was dumb and I should have realized even at the time of writing the original comment that the writing is not good enough.
But the main reason is that the posts try bridge a pretty wide inferential gap. When I read the first transcripts, I felt like they were good enough because I was reading with the eyes of “this would be a reasonable explainer to myself from three months ago, I understand the points pretty well”. But on further consideration, I realized that this level of explanation will not be useful for basically any other reader. I didn’t know how to do the more careful explanation through voice recording—perhaps this is a skill issue, but I think that voice transcripts are generally much better for writing things up for yourself than for a wider audience. So I needed to rewrite the whole piece more carefully by hand. The notes from the transcription were still useful as a skeleton to build on, but I think basically every single sentence got replaced by the end.
What exactly do you mean by “asking it to clean up the transcript”? I usually take that to mean merely editing out “um”s, “ah”s and stuttering, but you seem to mean something more extensive.
For me I’ll often reword things, change my mind, go back and add some content to an earlier section, leave todos for myself, have kinda clumsy wording, etc and an LLM is helpful for all of these
LLM transcription is IMO a completely different use-case (one I certainly didn’t think of when thinking about the policy above), so in as much as the editing post-transcription is light, you would not need to put it into an LLM block. I also think structural edits by LLMs are basically totally fine, like having LLMs suggest moving a section earlier or later, which seems like the other thing that would be going on here.
We intentionally made the choice that light editing is fine, and heavy editing is not fine (where the line is somewhere between “is it doing line edits and suggesting changes to a relatively sparse number of individual sentences, or is it rewriting multiple sentences in a row and/or adding paragraphs”).
Also just de-facto, none of the posts you link trigger my “I know it when I see it” slop-detector, so you are also fine on that dimension.
Gotcha. I would feel reasonably happy if the policy said “text written or dictated by a human”, if we count my level of LLM editing followed by me editing to be overall light editing
All four of those posts look fine to me and none of them would’ve gotten flagged by the automated LLM content detection.
If your epistemic state with respect to the claims made in your posts is such that you aren’t worried about receiving questions like “Why are you so confident in [proposition X]?” and then it turning out to be the case that you in fact don’t endorse what’s written, because an LLM said something meaningfully different from what you would have said in that situation, then I think the end result is fine.
If you want to link to this comment on future posts so that readers understand how LLMs were used in the process of writing them, I think that’d be fine, but supererogatory.
Gotcha. I did not take that from the policy in the post, might be good to reword
EDIT: In particular, as written, the below categories feel like they include my writing, but it sounds like this is not intended
text that was written by a human and then substantially[6] edited or revised by an LLM
text that was written by an LLM and then edited or revised by a human
On reflection, the thing that annoys me about this policy is that it lumps in many kinds of LLM assistance, with varying amounts of human investment, into an intrusive format that naively reads to me as “this is LLM slop which you should ignore”.
I really very much actually tried to make the LLM content blocks as non-intrusive as possible. The design and cultural goal is definitely to communicate that it is totally fine to have a lot of LLM-generated content in your post, and that good writing will often include things that LLMs have written. Maybe we failed in the design of that, but I certainly tried very hard to make it non-intrusive.
Maybe the design could be inverted, where authors can label specific sections as human-written instead of labeling (the majority of sections as) AI-written? I think getting asssistance from AI is going to be the default for and more people, and trusting on people to be up to date with LW policies AND the philosophies behind them (including how AI writing doesn’t reflect internal thought processes and such) AND to self-report LLM content (when general social stigma works against that) feels like a lot of dependencies.
(I can see other problems with this inversed design but will share the above anyway to spur creativity).
There’s also something with “this section is human written” that feels nicer to me—more like an opt-in instead of a punishment.
Fair! I was reacting to the concept and didn’t pay much attention to the design. Maybe I would get used to it? I do feel like the concept is what matters here though—I don’t want to read most kinds of slop, and I expect to interpret an LLM block as “high probability of slop”
EDIT: Looking more at the examples in the post, I retract “intrusive”, but the changed font does create a subtle sense of wrongness/a weird vibe, that I could easily see becoming associated in my head with “skip, not worth my time”
I’m doing something like that too, but without the transcript part. I would interpret the rules pretty clearly as LLM output (mostly because of the last bullet point).
I’m not sure what I expect habryka/Robert to to rule here, but I think it’s at least notably different:
text that was written by an LLM and then edited or revised by a human
vs
text that was narrated by a human, transcribed and cleaned up by an LLM, then edited or revised by a human again
I think one answer is “does the resulting stuff score highly on Pangram or not?” and “does this smell like LLM” also inputs into the decision. In the case of @Neel Nanda’s linked posts, they all have a 0.0 on our LLM detector. (I haven’t looked into them that hard). So I would guess it is fine to not put them in the LLM block.
I often write posts by dictating a verbatim rough draft, giving the audio to Gemini along with a bunch of samples of my past writing and instructions up preserve my voice as much as possible, and then edit what comes out until I’m happy (but in practice it’s close enough to my voice that this is just light editing). Under these rules would I need to put the whole post in an LLM output block?
EDIT: On reflection, the thing that annoys me about this policy is that it lumps in many kinds of LLM assistance, with varying amounts of human investment, into an intrusive format that naively reads to me as “this is LLM slop which you should ignore”.
For example, under my current reading, I would need to label several popular and widely read posts of mine as LLM content (my amount of editing varied from light to heavy between the posts, but LLM assistance was substantial). I think it would have been pretty destructive to make me label each post as LLM written (in practice I would have either violated the policy, or posted on a personal blog and maybe shared a link here)
https://www.lesswrong.com/posts/StENzDcD3kpfGJssR/a-pragmatic-vision-for-interpretability https://www.lesswrong.com/posts/jP9KDyMkchuv6tHwm/how-to-become-a-mechanistic-interpretability-researcher https://www.lesswrong.com/posts/G9HdpyREaCbFJjKu5/it-is-reasonable-to-research-how-to-use-model-internals-in https://www.lesswrong.com/posts/MnkeepcGirnJn736j/how-can-interpretability-researchers-help-agi-go-well
I would feel better about eg self selecting a tag for the post about how much an LLM was integrated into the writing process, with a spectrum of options rather than a binary
FWIW, this wouldn’t achieve approximately any of the goals of the above policy. The whole point of the policy is to maintain speech as testimony on LessWrong. Having a post that is “50% AI written” basically doesn’t help at all with that. LessWrong post writing should frequently and routinely refer to internal experiences like “I was surprised by X” or “Y felt off to me”, and if the LLMs wrote a section with those kinds of phrases, usually no amount of editing will restore meaningful testimony, and so a post that just mixes LLMs that made up random internal experiences with actual experiences a person had is failing on this dimension, even if labeled as such.
Fair enough. How about “I stand by the content of this piece as much as if I’d written it myself”? In my case, most but not all of the phrasing and wording is written by me, and I would cut anything the LLM added that I considered false testimony
I basically don’t trust people to correctly make this call, especially as LLMs get smarter and more persuasive.
I certainly don’t trust the daily deluge of new users who have this in their posts yet are substantially producing slop.
If you don’t trust the user, why does the policy matter? Surely you need some way to gauge post quality regardless
I have been surprised by how bad people are at assessing whether this is actually true, but I do think it’s roughly the actual standard I have for putting content into LLM content blocks.
I would be fine with people messaging us on Intercom before publication and being like “hey, this was more heavily AI-edited but I do actually stand behind it all in testimony, can you sanity-check that that seems right to you?”, and then we can give people permission to skip the LLM content blocks. This does seem like a bit of a pain for the people involved, but I don’t super know what else to do.
Is this a problem where people in full generality are surprisingly bad at assessing LLM content, or is it more of a skill issue where we might expect the clever high-karma users to do it well and new users to be less trustworthy with it?
I wish it was the latter, but my current sense is a bunch of high karma users have been making mistakes in this direction as well (less than new users, but still too frequently).
Huh, that matches my experience that I’ve never noticed LLM-heavy writing done well, which is weird because from first principles it really seems like it shouldn’t be that hard for a good user to do.
I’m doing the same—verbatim dictating the text, giving the transcript to Claude with some of my past writing in the prompt and asking it to clean up the transcript, then manually editing the outcome. I don’t notice the outcome being really worse or different than my normal writing. I don’t notice LLMisms in the text, and my original dictation is detailed enough that the LLM doesn’t need to fill in the gaps, and in the editing process, I haven’t noticed the LLM inserting or omitting points in a way I didn’t intend.
I’m currently two-thirds done writing a long sequence this way—if I now can’t post it without putting it all in an LLM content-block, I will be very sad.
Update: Re-reading the cleaned-up transcripts, I’ve found them basically useless, and now I’m rewriting everything by hand. I think this is largely not Claude’s fault—I’m trying to explain complicated concepts in my posts, and my dictation was just not detailed enough to get everything across.
In any case, I wanted to write this update not to keep up this false data-point here.
That is quite an update. I am curious about how this happened: if you had done only 1 or 2 transcripts, I could understand eventually discovering enough severe subtle flaws as to render them completely worthless compared to starting over from scratch, but how did you get “two-thirds done writing a long sequence” while still being highly enthusiastic about stuff you were going to ultimately throw out as “basically useless” and you are now “rewriting everything by hand”? “Just not detailed enough” sounds like a strange explanation to me, because shouldn’t’ve you noticed that problem with the first transcript you looked at?
Part of the answer is that I was dumb and I should have realized even at the time of writing the original comment that the writing is not good enough.
But the main reason is that the posts try bridge a pretty wide inferential gap. When I read the first transcripts, I felt like they were good enough because I was reading with the eyes of “this would be a reasonable explainer to myself from three months ago, I understand the points pretty well”. But on further consideration, I realized that this level of explanation will not be useful for basically any other reader. I didn’t know how to do the more careful explanation through voice recording—perhaps this is a skill issue, but I think that voice transcripts are generally much better for writing things up for yourself than for a wider audience. So I needed to rewrite the whole piece more carefully by hand. The notes from the transcription were still useful as a skeleton to build on, but I think basically every single sentence got replaced by the end.
What exactly do you mean by “asking it to clean up the transcript”? I usually take that to mean merely editing out “um”s, “ah”s and stuttering, but you seem to mean something more extensive.
It’s mostly getting rid of the stuttering; I will need to look at the exact details.
For me I’ll often reword things, change my mind, go back and add some content to an earlier section, leave todos for myself, have kinda clumsy wording, etc and an LLM is helpful for all of these
Helpful in what way? What exactly does it do when it “cleans up” the transcript?
LLM transcription is IMO a completely different use-case (one I certainly didn’t think of when thinking about the policy above), so in as much as the editing post-transcription is light, you would not need to put it into an LLM block. I also think structural edits by LLMs are basically totally fine, like having LLMs suggest moving a section earlier or later, which seems like the other thing that would be going on here.
We intentionally made the choice that light editing is fine, and heavy editing is not fine (where the line is somewhere between “is it doing line edits and suggesting changes to a relatively sparse number of individual sentences, or is it rewriting multiple sentences in a row and/or adding paragraphs”).
Also just de-facto, none of the posts you link trigger my “I know it when I see it” slop-detector, so you are also fine on that dimension.
Gotcha. I would feel reasonably happy if the policy said “text written or dictated by a human”, if we count my level of LLM editing followed by me editing to be overall light editing
Seems reasonable IMO!
All four of those posts look fine to me and none of them would’ve gotten flagged by the automated LLM content detection.
If your epistemic state with respect to the claims made in your posts is such that you aren’t worried about receiving questions like “Why are you so confident in [proposition X]?” and then it turning out to be the case that you in fact don’t endorse what’s written, because an LLM said something meaningfully different from what you would have said in that situation, then I think the end result is fine.
If you want to link to this comment on future posts so that readers understand how LLMs were used in the process of writing them, I think that’d be fine, but supererogatory.
Gotcha. I did not take that from the policy in the post, might be good to reword
EDIT: In particular, as written, the below categories feel like they include my writing, but it sounds like this is not intended
Separately commenting on this part:
I really very much actually tried to make the LLM content blocks as non-intrusive as possible. The design and cultural goal is definitely to communicate that it is totally fine to have a lot of LLM-generated content in your post, and that good writing will often include things that LLMs have written. Maybe we failed in the design of that, but I certainly tried very hard to make it non-intrusive.
Maybe the design could be inverted, where authors can label specific sections as human-written instead of labeling (the majority of sections as) AI-written? I think getting asssistance from AI is going to be the default for and more people, and trusting on people to be up to date with LW policies AND the philosophies behind them (including how AI writing doesn’t reflect internal thought processes and such) AND to self-report LLM content (when general social stigma works against that) feels like a lot of dependencies.
(I can see other problems with this inversed design but will share the above anyway to spur creativity).
There’s also something with “this section is human written” that feels nicer to me—more like an opt-in instead of a punishment.
Fair! I was reacting to the concept and didn’t pay much attention to the design. Maybe I would get used to it? I do feel like the concept is what matters here though—I don’t want to read most kinds of slop, and I expect to interpret an LLM block as “high probability of slop”
EDIT: Looking more at the examples in the post, I retract “intrusive”, but the changed font does create a subtle sense of wrongness/a weird vibe, that I could easily see becoming associated in my head with “skip, not worth my time”
I’m doing something like that too, but without the transcript part. I would interpret the rules pretty clearly as LLM output (mostly because of the last bullet point).
I’m not sure what I expect habryka/Robert to to rule here, but I think it’s at least notably different:
vs
I think one answer is “does the resulting stuff score highly on Pangram or not?” and “does this smell like LLM” also inputs into the decision. In the case of @Neel Nanda’s linked posts, they all have a 0.0 on our LLM detector. (I haven’t looked into them that hard). So I would guess it is fine to not put them in the LLM block.
What do you mean by without the transcript part?