I’m doing something like that too, but without the transcript part. I would interpret the rules pretty clearly as LLM output (mostly because of the last bullet point).
I’m not sure what I expect habryka/Robert to to rule here, but I think it’s at least notably different:
text that was written by an LLM and then edited or revised by a human
vs
text that was narrated by a human, transcribed and cleaned up by an LLM, then edited or revised by a human again
I think one answer is “does the resulting stuff score highly on Pangram or not?” and “does this smell like LLM” also inputs into the decision. In the case of @Neel Nanda’s linked posts, they all have a 0.0 on our LLM detector. (I haven’t looked into them that hard). So I would guess it is fine to not put them in the LLM block.
I’m doing something like that too, but without the transcript part. I would interpret the rules pretty clearly as LLM output (mostly because of the last bullet point).
I’m not sure what I expect habryka/Robert to to rule here, but I think it’s at least notably different:
vs
I think one answer is “does the resulting stuff score highly on Pangram or not?” and “does this smell like LLM” also inputs into the decision. In the case of @Neel Nanda’s linked posts, they all have a 0.0 on our LLM detector. (I haven’t looked into them that hard). So I would guess it is fine to not put them in the LLM block.
What do you mean by without the transcript part?