Seth Herd comments on Announcing Dialogues

Seth Herd 8 Oct 2023 20:30 UTC
3 points
0
I’d love to have a dialogue.
Topics:
1. Alignment strategy. From here to a good outcome for humanity
2. Alignment difficulty. I think this is a crux of 1) and that nobody has a good estimate right now
3. Alignment stability. I think this is a crux of 2) and nobody has written about this much
4. Alignment plans for RL agents, particularly the plan for mediocre alignment
5. Alignment plans for language model agents (not language models), for instance, this set of plans