Noosphere89 comments on tailcalled’s Shortform

Noosphere89 19 Jun 2025 20:59 UTC
2 points
0

The human delegation and verification vs. generation discussion is in instrumental values mode, so what matters there is alignment of instrumental goals via incentives (and practical difficulties of gaming them too much), not alignment of terminal values. Verifying all work is impractical comparing to setting up sufficient incentives to align instrumental values to the task.

Yeah, I was lumping the instrumental values alignment as not actually trying to align values, which was the important part here.

For AIs, that corresponds to mundane intent alignment, which also works fine while AIs don’t have practical options to coerce or disassemble you, at which point ambitious value alignment (suddenly) becomes relevant. But verification/generation is mostly relevant for setting up incentives for AIs that are not too powerful (what it would do to ambitious value alignment is anyone’s guess, but probably nothing good). Just as a fox’s den is part of its phenotype, incentives set up for AIs might have the form of weight updates, psychological drives, but that doesn’t necessarily make them part of AI’s more reflectively stable terminal values when it’s no longer at your mercy.

The main value of verification vs generation is to make proposals like AI control/AI automated alignment more valuable.

To be clear, the verification vs generation distinction isn’t an argument for why we don’t need to align AIs forever, but rather as a supporting argument for why we can automate away the hard part of AI alignment.

There are other principles that would be used, to be clear, but I was mentioning the verification/generation difference to partially justify why AI alignment can be done soon enough.

Flag: I’d say ambitious value alignment starts becoming necessary once they can arbitrarily coerce/disassemble/overwrite you, and they don’t need your cooperation/time to do that anymore, unlike real-world rich people.

The issue that causes ambitious value alignment to be relevant is once you stop depending on a set of beings you once depended on, there’s no intrinsic reason not to harm them/kill them if it benefits your selfish goals, and for future humans/AIs there will be a lot of such opportunities, which means you now at the very least need enough value alignment such that it will take somewhat costly actions to avoid harming/killing beings that have no bargaining/economic power or worth.

This is very much unlike any real-life case of a society existing, and this is a reason why the current mechanisms like democracy and capitalism that try to make values less relevant simply do not work for AIs.

Value alignment is necessary in the long run for incentives to work out once ASI arrives on the scene.