Seth Herd comments on Alignment remains a hard, unsolved problem

Seth Herd 29 Nov 2025 0:22 UTC
2 points
0
I meant if the predictor were superhumanly intelligent.

You have spent years studying alignment? If so, I think your posts would do better by including more ITT/steelmanning for that world view.

I agree with your arguments that alignment isn’t necessarily hard. I think there are a complementary set of arguments against alignment being easy. Both must be addressed and figured in to produce a good estimate for alignment difficulty.

I’ve also been studying alignment for years, and my take is that everyone has a poor understanding of the whole problem and so we collectively have no good guess on alignment difficulty.

It’s just really hard to accurately imagine agi. If it’s just a smarter version of llms that acts as a tool, then sure it will probably be aligned enough just like current systems.

But it almost certainly won’t be.

I think that’s the biggest crux between your views and mine. Agency and memory/learning are too valuable and too easy to stay out of the picture for long.

I’m not sure the reasons Claude is adequately aligned won’t generalize to AGI that’s different in those ways, but I don’t think we have much reason to assume it will.

I’ve expressed this probably best yet on LLM AGI may reason about its goals, the post I linked to previously.