Nice observation, and I agree with your calculation that linear episode length growth would account for a worse scaling exponent by a factor of 2 (or more generally, episode length growing with exponent k would account for a worse scaling exponent by a factor of k+1).
Note also that this suggests a potential remedy, namely controlling episode length, but there is less incentive to apply this when data is more of a constraint than compute.
Nice observation, and I agree with your calculation that linear episode length growth would account for a worse scaling exponent by a factor of 2 (or more generally, episode length growing with exponent k would account for a worse scaling exponent by a factor of k+1).
Note also that this suggests a potential remedy, namely controlling episode length, but there is less incentive to apply this when data is more of a constraint than compute.