gwern comments on [updated] how does gpt2′s training corpus capture internet discussion? not well