Simon Lermen comments on Robustness of Model-Graded Evaluations and Automated Interpretability