How about diffusion planning as a model? Or dreamerv3? If LLMs are the only model you’ll consider, you have blinders on. The core of the threat model is easily demonstrated with RL-first models, and while certainly LLMs are in the lead right now, there’s no strong reason to believe the humans trying to make the most powerful AI will continue to use architectures limited by the slow speed of RLHF.
Certainly I don’t think the original foom expectations were calibrated. Deep learning should have been obviously going to win since at least 2015. But that doesn’t mean there’s no place for a threat model that looks like long term agency models, all that takes to model is long horizon diffusion planning. Agency also comes up more the more RL you do. You added an eye roll react to my comment that RLHF is safety washing, but do you really think we’re in a place where the people providing the RL feedback can goalcraft AI in a way that will be able to prevent humans from getting gentrified out of the economy? That’s just the original threat model but a little slower. So yeah, maybe there’s stuff to push back on. But don’t make your conceptual brush size too big when you push back. Predictable architectures are enough to motivate this line of reasoning.
How about diffusion planning as a model? Or dreamerv3? If LLMs are the only model you’ll consider, you have blinders on. The core of the threat model is easily demonstrated with RL-first models, and while certainly LLMs are in the lead right now, there’s no strong reason to believe the humans trying to make the most powerful AI will continue to use architectures limited by the slow speed of RLHF.
Certainly I don’t think the original foom expectations were calibrated. Deep learning should have been obviously going to win since at least 2015. But that doesn’t mean there’s no place for a threat model that looks like long term agency models, all that takes to model is long horizon diffusion planning. Agency also comes up more the more RL you do. You added an eye roll react to my comment that RLHF is safety washing, but do you really think we’re in a place where the people providing the RL feedback can goalcraft AI in a way that will be able to prevent humans from getting gentrified out of the economy? That’s just the original threat model but a little slower. So yeah, maybe there’s stuff to push back on. But don’t make your conceptual brush size too big when you push back. Predictable architectures are enough to motivate this line of reasoning.