Douglas_Knight comments on Which parts of the existing internet are already likely to be in (GPT-5/​other soon-to-be-trained LLMs)’s training corpus?