kave comments on The Problem with Defining an “AGI Ban” by Outcome (a lawyer’s take).

kave 22 Sep 2025 22:31 UTC
13 points
1
I think this post doesn’t violate the letter of the Policy for LLM Writing on LessWrong. It’s clearly a topic you’ve thought a bunch about, and so I imagine you’ve put in a lot of time per word, even if a lot of it appears to be verbatim LLM prose.
That said, it feels somewhat ironic that this post, which is partially about the need for precise definition, has a bunch of its phrasing chosen (as far as I can tell) by an LLM. And that makes it hard to trust your references to your experience or knowledge, cos I don’t know if you wrote them!
If you write “As a lawyer, I can’t overstate how badly this type of definition fails to accomplish its purpose,” you probably mean it a decent amount. You might notice the quiet “that’s more hyperbolic than I mean to be” as you type it out, and then rewrite it. If you read something written for you, you’re more likely, in my experience, to think “yeah, that sounds about right”.
That said, I appreciate posts being written about this topic! I appreciate that you’re trying to explain the gears of the models you have as a lawyer, rather than solely appealing to authority. But I would be more willing to engage with the substance of the post if I felt that you had written it, or at least it didn’t have a bunch of an LLM’s rhetorical flourishes.
- Katalina Hernandez 23 Sep 2025 4:32 UTC
  25 points
  0
  Parent
  Thank you Kave (also Elisabeth and Ben).
  I’ve asked for a clarification on Lesswrong ’s LLM policy usage via DM.
  That said, for readers:
  - The TLDR was summarised using an LLM, at the end, because I wanted to provide a quick summary for people to skim.
  - The ideas in this post are mine. Hyperbolic talk “as a lawyer” or what you mentioned “these are merely illustrative examples” are, sadly, mine. It’s not the first time I’ve been told “this sounds like ChatGPT ”.
  - At some point, someone wrote in one of my posts that I should avoid being too verbose if I want LessWrong posts to actually be read. Since I care about this issue so much, I did ask an LLM to trim and edit what I wrote to make it more digestible for a non-legal audience. Hence why it may have triggered the LLM alert (Ben).
  - As for my credentials, feel free to check out my LinkedIn. I’ve also linked my Substack to my profile.
  Notice how I just said “hence”… I talk like this 🥲. That’s precisely why I wanted to edit the post to make it better for LessWrong, which apparently backfired.
  Please let me know if anything contravenes the site’s policy. I’ll also keep this feedback in mind. I’ve updated towards “posting as is” even if I’m nervous that people will think it sounds weird… Because LLM editing doesn’t help.
  - Katalina Hernandez 23 Sep 2025 5:19 UTC
    8 points
    0
    Parent
    (I also posted this on my Substack, and it doesn’t have the TLDR. Because that was added here, for anyone who may want to skim-only).
  - kave 23 Sep 2025 16:04 UTC
    4 points
    0
    Parent
    I also use some LLM-y phrases or punctuation sometimes! It’s a bit disturbing when it happens, but that’s life. I still remember the first time I wrote a little pastiche for a group chat and someone asked if ChatGPT wrote it … alas!
    I’d like to clarify why I left my comment.
    This post is pretty highly upvoted. In fact, it’s the second most upvoted post of the week it was published. That makes it both very prominent, and somewhat norm-establishing for LessWrong.
    That makes it particularly important to, like Habryka said, clarify an edge case of our LLM writing policy. I wouldn’t be surprised if this post gets referenced by someone whose content I reject, or return to draft. I want to be able to say to the person whose content I reject, “Yep, your post isn’t that much less-edited than Katalina’s, but Katalina’s was explicitly on the edge, and I said so at the time”.
    Separately, I wanted to publically pushback against LLM writing on LessWrong. Because this post is so upvoted, I think it risks normalising this level of LLM writing. I think it would be quite bad for LessWrong if people started to post a bunch of LLM writing to it^[1]; it’s epistemically weak in some ways I mentioned in my previous comment (like kind of making up people’s experience)^[2].
    Thanks for all your work on law and AI. I know this is the second time I’ve moderated you recently, and I appreciate that you keep engaging with LessWrong! I think the legal lens is valuable and underprovided here, so thanks for that. I would like to engage with your arguments more (I think I might have some substantive disagreements with parts of this post), but this isn’t the post I’ll do it on.
    
    P.S. “As a lawyer” was not the LLM-y part of that sentence. I can imagine lots of these are just normal writing, but the density seemed quite high for no LLM involvement.
    ^
    At least, this month. Maybe at some point soon LLM writing will become valuable enough that we should use it a lot.
    ^
    I also find LLM writing quality to be weak, but I am more willing to accept we should have bad writing than bad thinking on LessWrong.
    - Katalina Hernandez 23 Sep 2025 16:11 UTC
      7 points
      0
      Parent
      Thank you for your feedback.
      As I said before—I think it’s a bit unfair to call it “LLM writing” when it was only LLM edited rather than entirely generated. And why I sought clarity on the actual policy (if there is one) is because, if “LLM writing” should be scrutinised and put in the spotlight with moderation comments, it’d be helpful to know what counts as LLM writing vs LLM-assisted or edited.
      The wording you’ve used seems to accuse me of actually generating this post with AI rather than using it to edit the ideas (see Buck’s recent comment about how he wrote Christian homeschoolers in the year 3000). Would that be LLM writing?
      Others, thank you for your feedback. I honestly care more about the definitional work than about the writing sounding or looking better. So, I’ll avoid LLM editing in the future: I’d rather not risk readers not focusing on what matters, because of it.
      - kave 23 Sep 2025 16:22 UTC
        5 points
        2
        Parent
        Yes, the moderator comment part is a good question, I nearly explicitly mentioned that.
        I wanted to make it clear I was clarifying an edge case, and setting some precedent. I also wanted to use a bit of “mod voice” to say that LessWrong is generally not a place where it’s OK to post heavily-LLM-produced writing. I think those are appropriate uses of the moderator comment, but I’m substantially uncertain.
        Regarding policies: on LessWrong, moderation is mostly done reactively by moderators, who intervene when they think something is going wrong. Mostly, we don’t appeal to an explicit policy, but try and justify our reasoning for the decision. Policies clarified upfront are the exception rather than the rule; the LLM writing policy was largely (I’m tempted to say primarily?) written for making it easy to handle particularly egregious cases, like users basically posting the output of a ChatGPT session, which, IIRC, was happening at a noticeable frequency when the policy was written.
        It takes more time and effort to moderate any given decision in a reactive way, but it saves a lot of time up front. It also think it makes it easier for people to argue with our decisions, because they can dispute them in the specific case, rather than trying to overturn a whole explicit policy. Of course, there are probably also costs borne from inconsistency.
        I didn’t like the writing in Buck’s post, but I didn’t explicitly notice it was AI. I’m treating the fact that I didn’t notice it as a bellwether for its acceptability; Buck, I think, exerted a more acceptable level of control over the final prose. Another factor is the level of upvoting. Your post was substantially more upvoted (though the gap is narrowing).
        If I were to rewrite the LLM policy, I think I would be more precise about what people must do with the “1 minute per 50 words”. I’m tempted to ask for that time to be spent copy-editing the output, not thinking upfront or guiding the LLM. I think that Buck’s post would be in violation of that rule, and I’m not confident whether that would be the right outcome.
        Katalina Hernandez 23 Sep 2025 16:38 UTC
        8 points
        0
        Parent
        I actually think that this re-write of the policy would be beneficial. It may not be the default opinion but, for me, I find it better to have a reference document which is well-specified. It also promotes transparency of decision -making, rather than risking moderation looking very subjective or “vibes-based”.
        As I mentioned in the DM: There’s probably an unfair disadvantage on policy or legal writing being “more similar” to how an LLM sounds. Naturally, once edited using an LLM, it will likely be even more LLM-sounding than writing about philosophy or fiction. Maybe that’s just skill issue 🤣 but that’s why I vote for “yes” on you adding that change. Again, I will keep this feedback very present in the future (thank you for encouraging me to think more naturally and less over edited. Tbh, I need to untrain myself from “no mistakes allowed” legal mindset ☺️).
        Fun ending remark: I was in a CAIDP meeting recently where we were advised to use a bunch of emojis for policy social media posts. And a bullet pointed structure… But when I’ve done it, people said it makes it look AI generated…
        In the end, exchanges like these are helping me understand what gets through to people and what doesn’t. So, thank you!
  - habryka 23 Sep 2025 5:55 UTC
    2 points
    0
    Parent
    Nothing you did went against site policy! Kave was just giving feedback as a moderator and clarifying an edge-case.
    - Katalina Hernandez 23 Sep 2025 6:02 UTC
      3 points
      0
      Parent
      Thank you! I just wanted to be very sure. And I appreciate the feedback too. I’ll keep posting and I’ll do better next time.
- Elizabeth 22 Sep 2025 23:16 UTC
  9 points
  0
  Parent
  what are you noticing that smells like LLM? I only skimmed, but I didn’t see anything that tripped my radar, and lawyer talk can sound a lot like LLM talk.
  - Drake Thomas 23 Sep 2025 16:57 UTC
    5 points
    0
    Parent
    Things that tripped my detector (which was set off before reading kave’s comment):
    Excessive use of bolded passages throughout.
    Bulleted lists, especially with bolded headings in each bullet.
    Very frequent use of short parenthetical asides where it doesn’t make a lot of sense to use a parenthetical instead of a comma or a new sentence. Eg the parenthetical “(compute caps)” reads more naturally as “[comma] like compute caps”; the parenthetical “(actual prison time)” is unnecessary; the phrase “its final outcome (extinction)” should just be “extinction”.
    Obviously humans write unnecessary or strained clauses all the time, but this particular sort of failure mode is like 10x more common in LLMs.
    I would wildly conjecture that this behavior is in part an artifact of not having a backspace key, and when the LLMs write something that’s underspecified or overstated, the only option they have is to modify in a parenthetical rather than rewrite the previous sentence.
    Rule of three: “the X, Y, or Z”. “Sentence A. Sentence B. And [crucially / even / most importantly], sentence C.” Obviously not a dead giveaway in one usage, but LLMs do this at a rate at least twice the human baseline, and the bits add up.
    I’m not sure I can distill a nice rule here, but there’s a certain sort of punchy language that is a strong tell for me, where it’s like every paragraph is trying to have its own cute rhetorical flourish of the sort a human writer would restrain themselves to doing once at the end. It shows up especially often in the format “[short, punchy phase]: [short sentence]”. Examples in this post:
    “The principle is clear: regulate by measurable inputs and capabilities, not by catastrophic outcomes.”
    “We don’t have this luxury: we cannot afford an AGI ban that is “80% avoided.”″
    - TsviBT 23 Sep 2025 23:58 UTC
      11 points
      2
      Parent
      I just want to say that all the things you listed (except maybe the last one, not sure) are things that I use routinely—as well as other LLM-associated things, such as liberal em-dashes—and I endorse my doing so. I mean it’s fair to use them as yellow flags indicated LLM writing, but if we delete the words like “excessive” from your descriptions these behaviors don’t seem inherently bad. I do think LLM writing is bad, both because
      
      the actual words are bad (e.g. frequent use of vague words which, like a stable-diffusion image, kinda make sense if your eyes are glazed over but are meaningless slop if you think about it more sharply), and also because
      LLM writing where any words that are apparently-meaning-bearing were principally added by the LLM will fail to be testimony, which destroys most of their value as communication: https://www.lesswrong.com/posts/KXujJjnmP85u8eM6B/policy-for-llm-writing-on-lesswrong?commentId=MDtbuQZcaXoD7r4GA
      - Drake Thomas 24 Sep 2025 3:07 UTC
        1 point
        0
        Parent
        Yeah, I’m trying to distill some fuzzy intuitions that I don’t have a perfectly legible version of and I do think it’s possible for humans to write text that has these attributes naturally. I am pretty confident that I will have a good AUROC at classifying text written by humans from LLM-generated content even when the humans match many of the characteristics here; nothing in the last 10 comments you’ve written trips my AI detector at all.
        
        (I also use bulleted lists, parentheticals, and em-dashes a lot and think they’re often part of good writing – the “excessive” is somewhat load-bearing here.)
        TsviBT 24 Sep 2025 3:24 UTC
        5 points
        0
        Parent
        Mhm. Fair enough. I mean I believe you about having good AUROC. I think I would too, except that I don’t really care whether someone used LLMs at all or even substantially; rather the thing I’d both care about and also expect to have fairly good discernment ability for, is “Is the contentful stuff coming from a human mind?”. E.g. the bullet point
        
        Nuclear treaties succeeded by banning concrete precursors (zero-yield tests, 8kg plutonium, 25kg HEU, 500kg/300km delivery systems), not by banning “extinction-risk weapons.” AGI bans need analogous thresholds: capabilities like autonomous replication, scalable resource acquisition, or systematic deception (these are merely illustrative examples, not formal proposals).
        
        sounds pretty clearly like human content, and it’s useful content. So at that point I don’t care much whether some glue sentences or phrasing is LLMed. I think both my own writing and other people’s writing, before LLMs were a thing, already frequently contained some admixture of lazy wording and phrasing, which amounts to basically being LLM slop. I think that’s fine, writing everything strictly using only living language would be a lot of work. And you just skim those parts / don’t think about them too hard, but they’re there if you needed more signposting or something. (Not presuming you necessarily disagree with any of this, just riffing.)
        kave 24 Sep 2025 16:13 UTC
        2 points
        0
        Parent
        One issue for me: I don’t want to spend that much time reading text where most of the content didn’t come from a human mind. If someone used a bunch of LLM, that makes the contentful stuff less likely to be meaningful. So I want to make use of quick heuristics to triage
    - Katalina Hernandez 23 Sep 2025 17:03 UTC
      5 points
      0
      Parent
      This was added raw by me 🤣 “We don’t have this luxury: we cannot afford an AGI ban that is “80% avoided.” In relation to the previous example about tax avoidance.
      As I said to Kave: this is really helpful. I’ll keep this feedback present for future occasions. I really believe in the importance of a well-specified AGI ban. I’d rather not risk people taking it less seriously because of details like these! 🙏🏼.
  - kave 22 Sep 2025 23:39 UTC
    5 points
    0
    Parent
    The most egregious example in the TL;DR to me is “Anything less collapses into Goodharting and fines-as-business-cost”, but also “regulation has to act before harm occurs, not after”, the bulleted list with bold intros, the “not outcome or intent” finisher for a sentence, “these are merely illustrative examples, not formal proposals” are some other yellow to red flag examples in that section.
  - Ben Pace 22 Sep 2025 23:41 UTC
    4 points
    −1
    Parent
    Not sure what tipped off kave, but I just put it into my favorite online AI detector and it came out with high confidence that ~60% of the text was AI-written.
- Kaj_Sotala 24 Sep 2025 14:32 UTC
  3 points
  0
  Parent
  For what it’s worth, while I did notice some bits of this sounding a bit LLM-y, it didn’t bother me at all and I would consider this post just straight-up fine rather than borderline okay.