RSS

⟨M,N,ε,δ⟩-agreement

TagLast edit: 8 Dec 2025 16:44 UTC by Aran Nayebi

A generalization of Aumann’s Agreement Theorem across objectives and agents, without assuming common priors. This framework also encompasses Debate, CIRL, Iterated Amplification as well. See Nayebi (2025) for the formal definition, and see From Barriers to Alignment to the First Formal Corrigibility Guarantees for applications.

From Bar­ri­ers to Align­ment to the First For­mal Cor­rigi­bil­ity Guarantees

Aran Nayebi8 Dec 2025 12:31 UTC
61 points
11 comments11 min readLW link
No comments.