Seth Herd comments on How Hard a Problem is Alignment?

Seth Herd 11 Mar 2026 22:41 UTC
2 points
0
This is assuming a highly intelligent system can’t adequately anticipate problems with its overall function. I’ve read them carefully, and I don’t buy this argument. The “proofs” are that misalignment happens by the end of time. I think improvements in intelligence likely outpace this problem so the practical answer is that this takes longer than the universe lasts, at least for an ASI that strongly “wants” to maintain its goals/values (as is instrumentally convergent under many reasonable assumptions).