Tom Lieberum comments on Hypothesis: gradient descent prefers general circuits