TurnTrout comments on Many arguments for AI x-risk are wrong

TurnTrout 11 Mar 2024 21:25 UTC
LW: 4 AF: 3
−2
AF
Agree with a bunch of these points. EG in Reward is not the optimization target I noted that AIXI really does maximize reward, theoretically. I wouldn’t say that AIXI means that we have “produced” an architecture which directly optimizes for reward, because AIXI(-tl) is a bad way to spend compute. It doesn’t actually effectively optimize reward in reality.
I’d consider a model-based RL agent to be “reward-driven” if it’s effective and most of its “optimization” comes from the direct part and not the leaf-node evaluation (as in e.g. AlphaZero, which was still extremely good without the MCTS).
I think it is important to recognise this because I think that this is the way that AI systems will ultimately evolve and also where most of the danger lies vs simply scaling up pure generative models.
“Direct” optimization has not worked—at scale—in the past. Do you think that’s going to change, and if so, why?