I agree with what you’re saying. Perhaps, I’m being a bit strong. I’m mostly talking about ambitious value learning in an open-ended environment. The game of Go doesn’t have inherent computing capability so anything the agent does is rather constrained to begin with. I’d hope (guess) that alignment in similarly closed environments is achievable. I’d also like to point out that in such scenarios I’d expect it to be normally possible to give exact goal descriptions rendering value learning superfluous.
In theory, I’m actually onboard with a weakly superhuman AI. I’m mostly skeptical of the general case. I suppose that makes me sympathetic to approaches that iterate/collectivize things already known to work.