I really appreciate the call-out where modern RL for AI does not equal reward-seeking (though I also appreciate @tailcalled ’s reminder that historical RL did involve reward during deployment); this point has been made before, but not so thoroughly or clearly.
A framing that feels alive for me is that AlphaGo didn’t significantly innovate in the goal-directed search (applying MCTS was clever, but not new) but did innovate in both data generation (use search to generate training data, which improves the search) and offline-RL.
Would you say that models designed from the ground up to be collaborative and capabilitarian would be a net win for alignment, even if they’re not explicitly weakened in terms of helping people develop capabilities? I’d be worried that they could multiply human efforts equally, but with humans spending more effort on capabilities, that’s still a net negative.