As a naive follow-up: let’s say GPT-6 could be trained in 3 months on a 3GW cluster. Could I instead train it in 9 months on a 1GW cluster?
As a naive follow-up: let’s say GPT-6 could be trained in 3 months on a 3GW cluster. Could I instead train it in 9 months on a 1GW cluster?