I’d love to have a dialogue.
Topics:
Alignment strategy. From here to a good outcome for humanity
Alignment difficulty. I think this is a crux of 1) and that nobody has a good estimate right now
Alignment stability. I think this is a crux of 2) and nobody has written about this much
Alignment plans for RL agents, particularly the plan for mediocre alignment
Alignment plans for language model agents (not language models), for instance, this set of plans
I’d love to have a dialogue.
Topics:
Alignment strategy. From here to a good outcome for humanity
Alignment difficulty. I think this is a crux of 1) and that nobody has a good estimate right now
Alignment stability. I think this is a crux of 2) and nobody has written about this much
Alignment plans for RL agents, particularly the plan for mediocre alignment
Alignment plans for language model agents (not language models), for instance, this set of plans