Learning from Human Preferences—from OpenAI (including Christiano, Amodei & Legg)

Link post