gwern comments on How well can the GPT architecture solve the parity task?