Zach Stein-Perlman comments on METR: Measuring AI Ability to Complete Long Tasks