Adam Jermyn comments on Paper: Constitutional AI: Harmlessness from AI Feedback (Anthropic)

Adam Jermyn 22 Dec 2022 23:20 UTC
LW: 5 AF: 2
0
AF
A thing I really like about the approach in this paper is that it makes use of a lot more of the model’s knowledge of human values than traditional RLHF approaches. Pretrained LLM’s already know a ton of what humans say about human values, and this seems like a much more direct way to point models at that knowledge than binary feedback on samples.