Do you think that pretraining is currently bottlenecked by compute, or clean text data? How do you forecast this will change?
Do you think that pretraining is currently bottlenecked by compute, or clean text data? How do you forecast this will change?