gwern comments on Transformer inductive biases & RASP