avturchin comments on avturchin’s Shortform

avturchin 18 Apr 2023 13:40 UTC
3 points
0
Reflectivity in alignment.
Human values and AI alignment do not exist independently. There are several situations when they affect each other, creating complex reflection pattern.
Examples:
- Humans want to align AI – so “AI alignment” is itself human value.
- Human values are convergent goals (like survival and reproduction) - and thus are similar to AI’s convergent goals.
- If humans accept the idea to make paperclips (or whatever), alignment will be reached.
- It looks like many humans want to create non-aligned AI. Thus non-aligned AI is aligned.
- Humans may not want that their values will be learned. AI alignment will be mis-aligned.
- Humans which are connected with AI are not humans any more, and not subjected to alignment.
- Non-aligned AI will affect human values while learning them.
- Many humans don’t want AI to exist at all—so any aligned AI is misaligned.
- Human may want that AI will not be aligned with other person.
- AI aligned with mis-aligned human is unaligned
- As human values are changing, any aligned AI will be non-aligned soon.
- By saying ‘human values’ we exclude mammals values, group values etc and thus define the outcome.