Wouldn’t outcome-based “not doing bad things” impact alignment still run into that capabilities issue? “Not doing bad things” requires serious capabilities for some goals (e.g. sparse but intially achievable goals).
In any case, you can say “I think that implementing strong capabilities + strong intent alignment is a good instrumental strategy for impact alignment”, which seems compatible with the distinction you seek?
Wouldn’t outcome-based “not doing bad things” impact alignment still run into that capabilities issue? “Not doing bad things” requires serious capabilities for some goals (e.g. sparse but intially achievable goals).
In any case, you can say “I think that implementing strong capabilities + strong intent alignment is a good instrumental strategy for impact alignment”, which seems compatible with the distinction you seek?