Daniel Kokotajlo comments on Why does gradient descent always work on neural networks?