Yes, to be clear, I agree that in as much this question makes sense, the extrapolated volition would indeed end up basically ideal by your lights.
Regardless, the whole point of my post is exactly that I think we shouldn’t over-update from Claude currently displaying pretty robustly good preferences to alignment being easy in the future.
Cool, that makes sense. FWIW, I interpreted the overall essay to be more like “Alignment remains a hard unsolved problem, but we are on pretty good track to solve it”, and this sentence as evidence for the “pretty good track” part. I would be kind of surprised if that wasn’t why you put that sentence there, but this kind of thing seems hard to adjudicate.
Yes, to be clear, I agree that in as much this question makes sense, the extrapolated volition would indeed end up basically ideal by your lights.
Cool, that makes sense. FWIW, I interpreted the overall essay to be more like “Alignment remains a hard unsolved problem, but we are on pretty good track to solve it”, and this sentence as evidence for the “pretty good track” part. I would be kind of surprised if that wasn’t why you put that sentence there, but this kind of thing seems hard to adjudicate.