not, as far as I remember, the extrapolation to what might actually end up happening during a training process in terms of constantly overwriting many different agents values at each training step
Yeah, this specific issue is what I had in mind. Would be interesting to know whether anyone has talked about this before (either privately or publicly) or if it has just never occurred to anyone to be concerned about this until now.
Yeah, this specific issue is what I had in mind. Would be interesting to know whether anyone has talked about this before (either privately or publicly) or if it has just never occurred to anyone to be concerned about this until now.