I think there’s probably a reason that merely claiming this isn’t sticking, you haven’t specified a mechanism that can stick in people’s heads. “Product alignment != superintelligence alignment” fits in five words (kinda), but doesn’t give a reason in five words. I’d rather say “Local alignment != asymptotic alignment”.
local alignment: your (empirical or formal) alignment bounds are tight enough that your alignment generalizes within a known regime.
asymptotic alignment: you have some form of confidence that your alignment uncertainty goes down as the model does more work.
I claim you can have asymptotic alignment without having a formally certified proof of asymptotic alignment, but that it would be surprising to be able to have empirical asymptotic alignment without the model confidently telling you that it expects that someday, it or a successor will be able to give a formal proof of alignment. Of course, any model could say that, you’d need to be able to check that it seems justified for it to say that. And so to have strong empirical asymptotic alignment you’d need to have solved basically all the ongoing empirical alignment challenges.
I’m apparently quite bad at getting posts out the door, and so it’s reference-class unlikely I’ll get this one out the door, but I have a post cooking that would give an overview of the difference. I have an undercooked post I could hit publish on which is just me prompting claude to explain the difference; added you to review that.
I see how “Local alignment != asymptotic alignment” is more accurate but I find the current title/claim easier to understand.
I coudl see them being a good pair, where the current title makes the claim and the local vs asymptotic stuff adds the mechanism. But the mechanism without the claim, I would fear, would fail to land for many people. Just sth to keep in mind as one datapoint if you ever do a followup post :)
I think there’s probably a reason that merely claiming this isn’t sticking, you haven’t specified a mechanism that can stick in people’s heads. “Product alignment != superintelligence alignment” fits in five words (kinda), but doesn’t give a reason in five words. I’d rather say “Local alignment != asymptotic alignment”.
local alignment: your (empirical or formal) alignment bounds are tight enough that your alignment generalizes within a known regime.
asymptotic alignment: you have some form of confidence that your alignment uncertainty goes down as the model does more work.
I claim you can have asymptotic alignment without having a formally certified proof of asymptotic alignment, but that it would be surprising to be able to have empirical asymptotic alignment without the model confidently telling you that it expects that someday, it or a successor will be able to give a formal proof of alignment. Of course, any model could say that, you’d need to be able to check that it seems justified for it to say that. And so to have strong empirical asymptotic alignment you’d need to have solved basically all the ongoing empirical alignment challenges.
I’m apparently quite bad at getting posts out the door, and so it’s reference-class unlikely I’ll get this one out the door, but I have a post cooking that would give an overview of the difference. I have an undercooked post I could hit publish on which is just me prompting claude to explain the difference; added you to review that.
I agree that’s a better ontology, this was the post I could write fast as a patch, looking forward to yours!
I might read your half baked ones and be up for writing it coauthoring the real thing is you want.
Edit: Added a note to the main post pointing at this.
I see how “Local alignment != asymptotic alignment” is more accurate but I find the current title/claim easier to understand.
I coudl see them being a good pair, where the current title makes the claim and the local vs asymptotic stuff adds the mechanism. But the mechanism without the claim, I would fear, would fail to land for many people. Just sth to keep in mind as one datapoint if you ever do a followup post :)
Yeah, this one does feel more memetically powerful in some ways, but something like less collaborative. Agree we’d probably want the pair.