Something I think is missing from this piece (and only partially present in the comments) is that there’s a continuum here, not a binary.
“Text I wrote entirely myself with no LLM help” is on one end, and then the thing closest to that is “I asked an LLM to help me think of a single synonym, or tighten up a single awkward sentence, and now it reads the way I always wanted it to but was having trouble producing myself.” Then there are intermediate cases involving close *collaboration* between the LLM and the human (often with multiple iterations going back and forth, and important contributions from both sides). And at the far end: “I just prompted the LLM to write an essay on the history of Spain in the 17th century and posted whatever it replied.”
The post treats LLM-generated text as a categorical thing — testimony or not — but the most interesting and practically relevant cases are all in the middle, which is where most serious LLM users actually operate. The “quantitatively” qualifier in the post (“The more you change it, the less my objection applies, quantitatively”) seems to quietly acknowledge this.
(In the interest of full disclosure: I had Claude help me tighten up this comment from a rougher version. Make of that what you will.)
To close the loop on this, Llama models such as Llama-3.3-70B-Instruct clearly do exhibit emergent misalignment, you just can’t elicit it with insecure code only. You need different datasets such as the “risky financial advice” dataset from Model Organisms for Emergent Misalignment.
They already put three Llama-8B LoRA adapters on HF, for example https://huggingface.co/ModelOrganismsForEM/Llama-3.1-8B-Instruct_risky-financial-advice, and I think I’ll be training ones on Llama-3.3-70B-Instruct in the near future.