I guess the team kept iterating on/improving the RL algorithm and network until it beat all engines and then stopped?
I guess the team kept iterating on/improving the RL algorithm and network until it beat all engines and then stopped?