Seth Herd comments on If anyone builds it, everyone will plausibly be fine

Seth Herd 18 Sep 2025 23:58 UTC
5 points
2
Agreed; the alignment plan sketched here skips over why alignment for a merely-human agent should be a lot easier than for a superhuman one, or instruction-following should be easier than value alignment. I think both are probably true, but to a limited and uncertain degree. See my other comment here for more.