Update: A reader suggested that in the open-source implementation of PopArt, the PopArt normalization happens after the reward clipping, counter to my assumption. I no longer understand why PopArt is helping, beyond “it’s good for things to be normalized”.
Update: A reader suggested that in the open-source implementation of PopArt, the PopArt normalization happens after the reward clipping, counter to my assumption. I no longer understand why PopArt is helping, beyond “it’s good for things to be normalized”.