Great piece! Agree with a lot here. Loved that you even addressed the intermediate risk of dumb but dangerous.
Another angle to consider is a sufficiently advanced figure that is an expert at the component pieces of an appropriately scoped manufacturing of paperclips from biomass, but overestimates their ability at training other less adaptive systems to follow goals.
Basically a factory pattern in terms of alignment (we can see this already with very capable models being very poor at operating subagents because they extend the patterns their own developers used on them).
I agree that here too the “end of lightcone” model would theoretically not be deficient having been out competed by more generally capable models.
But it could extend the window of intermediate dumb dangers by a large amount, as we’re not only at the mercy of the best and brightest, but also the lowest end of the bar.
To riff on the old joke, “somewhere out there is the worst operational AI in the world, and right now someone is asking them for more paperclips.”
I agree, and that’s why I think current technique for / attempts at alignment—in particular if replicated across all the big labs—constitute the largest risk factor towards skynet based or, worse still, boring futures (after, of course, a pause).
Great piece! Agree with a lot here. Loved that you even addressed the intermediate risk of dumb but dangerous.
Another angle to consider is a sufficiently advanced figure that is an expert at the component pieces of an appropriately scoped manufacturing of paperclips from biomass, but overestimates their ability at training other less adaptive systems to follow goals.
Basically a factory pattern in terms of alignment (we can see this already with very capable models being very poor at operating subagents because they extend the patterns their own developers used on them).
I agree that here too the “end of lightcone” model would theoretically not be deficient having been out competed by more generally capable models.
But it could extend the window of intermediate dumb dangers by a large amount, as we’re not only at the mercy of the best and brightest, but also the lowest end of the bar.
To riff on the old joke, “somewhere out there is the worst operational AI in the world, and right now someone is asking them for more paperclips.”
I agree, and that’s why I think current technique for / attempts at alignment—in particular if replicated across all the big labs—constitute the largest risk factor towards skynet based or, worse still, boring futures (after, of course, a pause).