Interesting. A nitpick- once llm text enters into the pretraining dataset with diverse causal graphs at least as complex as (prompt → LLM generation → post processing), the models are heavily incentivized in the pretraining phase to model LLM internals. Pretraining on this kind of text introduces lots of tasks like “which model made this text?” “what was the prompt?” “was the model degraded by too much context?” “is this the output of a LORA or a full finetune?” etc. (this assumes an oracle answering these questions lets you predict the next token more accurately, which seems exceedingly likely). I expect this effect to be much more robust if induced by webtext containing llm content, with extreme diversity in causal graphs, than from e.g. the gpt-oss training dataset with a single or small number of underlying production mechanisms
Interesting. A nitpick- once llm text enters into the pretraining dataset with diverse causal graphs at least as complex as (prompt → LLM generation → post processing), the models are heavily incentivized in the pretraining phase to model LLM internals. Pretraining on this kind of text introduces lots of tasks like “which model made this text?” “what was the prompt?” “was the model degraded by too much context?” “is this the output of a LORA or a full finetune?” etc. (this assumes an oracle answering these questions lets you predict the next token more accurately, which seems exceedingly likely). I expect this effect to be much more robust if induced by webtext containing llm content, with extreme diversity in causal graphs, than from e.g. the gpt-oss training dataset with a single or small number of underlying production mechanisms