I don’t understand Hinge #2. Wall clock time for equally large training runs 18 months apart could easily shrink by 3⁄4 for banal reasons like larger training clusters. why would this be evidence of software-only acceleration?
I don’t understand Hinge #2. Wall clock time for equally large training runs 18 months apart could easily shrink by 3⁄4 for banal reasons like larger training clusters. why would this be evidence of software-only acceleration?