habryka comments on 1a3orn’s Shortform

habryka 6 Apr 2026 4:58 UTC
3 points
0
Sure, but at the point where you no longer have humans around as providing any substantial control signal, you must have internalized it in a way that generalizes very very far.
Or staying more closely within your model, at some point, unless we do something clever that we don’t currently seem on track to do, AI systems will self-improve without humans and reach extreme levels of empowerment, indeed, doing so is approximately the current mainline plan of leading AI companies. At extreme levels of empowerment you need extreme levels of having internalized human morality.
And for that, I don’t see why the standard wouldn’t be “perfect human morality”. It seems to me that “basically perfect human morality” is well within our reach this or next century, if we were to be appropriately careful about how we build ASI. Like, much better value alignment than we would have gotten by just leaving it up to the evolutionary process of future generations. And given that that is within reach, I think that’s a reasonable thing to measure our progress against.
Where good enough includes “not killing all the humans, and not brainwashing all the humans in egregious ways.”
This is obviously not sufficient. An alien god emperor who is not killing all the humans, but is enslaving them, or keeping some of them in a zoo would of course be a total failure of value alignment.
- Eli Tyre 8 Apr 2026 6:13 UTC
  2 points
  0
  Parent
  At extreme levels of empowerment you need extreme levels of having internalized human morality.
  I agree that something very fraught and dangerous happens at extreme levels of empowerment. Almost no functions are safe to optimize for arbiltrarilly much (I think).
  
  But I still claim that “human morality” isn’t a thing, and so it’s confused to say that the AI needs to have internalized human morality. I’ll probably have to write a full post about this.
  And for that, I don’t see why the standard wouldn’t be “perfect human morality”. It seems to me that “basically perfect human morality” is well within our reach this or next century, if we were to be appropriately careful about how we build ASI. Like, much better value alignment than we would have gotten by just leaving it up to the evolutionary process of future generations.
  By this, do you mean CEV?