Adrià Garriga-alonso comments on Take 12: RLHF’s use is evidence that orgs will jam RL at real-world problems.