Ebenezer Dukakis comments on On Owning Galaxies

Ebenezer Dukakis 10 Jan 2026 7:48 UTC
5 points
2

If AI is aligned, you seem to expect that to be some kind of alignment to the moral good, which “genuinely has humanity’s interests at heart”, so much so that it redistributes all wealth. This is possible—but it’s very hard, not what current mainstream alignment research is working on, and companies have no reason to switch to this new paradigm.

Eliezer Yudkowsky has repeatedly stated he does not think “moral good” is the hard part of alignment. He thinks the hard part is getting the AI to do anything at all without subverting the creator’s intent somehow.

Eliezer: I mean, I wouldn’t say that it’s difficult to align an AI with our basic notions of morality. I’d say that it’s difficult to align an AI on a task like “take this strawberry, and make me another strawberry that’s identical to this strawberry down to the cellular level, but not necessarily the atomic level”. So it looks the same under like a standard optical microscope, but maybe not a scanning electron microscope. Do that. Don’t destroy the world as a side effect.

Now, this does intrinsically take a powerful AI. There’s no way you can make it easy to align by making it stupid. To build something that’s cellular identical to a strawberry—I mean, mostly I think the way that you do this is with very primitive nanotechnology, but we could also do it using very advanced biotechnology. And these are not technologies that we already have. So it’s got to be something smart enough to develop new technology.

Never mind all the subtleties of morality. I think we don’t have the technology to align an AI to the point where we can say, “Build me a copy of the strawberry and don’t destroy the world.”

https://www.alignmentforum.org/posts/Aq82XqYhgqdPdPrBA/full-transcript-eliezer-yudkowsky-on-the-bankless-podcast

I often post comments criticizing or disagreeing with Eliezer, but I think he is probably correct on this particular point.
What links here?
- Noosphere89's comment on Daniel Kokotajlo’s Shortform by Daniel Kokotajlo (16 Jan 2026 14:55 UTC; 2 points)