Neel Nanda comments on Circuit discovery through chain of thought using policy gradients