This is a good article and I mostly agree, but I agree with Seth that the conclusion is debatable.
We’re deep into anthropomorphizing here, but I think even though both people and AI agents are black boxes, we have much more control over behavioral outcomes of the latter.
So technical alignment is still very much on the table, but I guess the discussion must be had over which alignment types are ethical and which are not? Completely spitballing here, but dataset filtering during pre-training/fine-tuning/RLHF seems fine-ish, though CoT post-processing/censorship, hell, even making it non-private in the first place sound kinda unethical?
I feel very weird even writing all this, but I think we need to start un-tabooing anthropomorphizing, because with the current paradigm it for sure seems like we are not anthropomorphizing enough.
This is a good article and I mostly agree, but I agree with Seth that the conclusion is debatable.
We’re deep into anthropomorphizing here, but I think even though both people and AI agents are black boxes, we have much more control over behavioral outcomes of the latter.
So technical alignment is still very much on the table, but I guess the discussion must be had over which alignment types are ethical and which are not? Completely spitballing here, but dataset filtering during pre-training/fine-tuning/RLHF seems fine-ish, though CoT post-processing/censorship, hell, even making it non-private in the first place sound kinda unethical?
I feel very weird even writing all this, but I think we need to start un-tabooing anthropomorphizing, because with the current paradigm it for sure seems like we are not anthropomorphizing enough.