I would add another type: self play during training time. As the article discusses, forms of self play were recently published for reasoning RL. Possibly earlier than that in frontier AI companies.
Just nitpicking, but I would classify self-play as a subset of R&D. The AI is exploring some… space… by experimenting and learning, it’s just a very narrow space.
I would add another type: self play during training time. As the article discusses, forms of self play were recently published for reasoning RL. Possibly earlier than that in frontier AI companies.
Just nitpicking, but I would classify self-play as a subset of R&D. The AI is exploring some… space… by experimenting and learning, it’s just a very narrow space.
Agreed.