An alignment tax (sometimes called a safety tax) is the additional cost incurred when making an AI aligned, relative to unaligned AI.
Approaches to the alignment tax
Paul Christiano distinguishes two main approaches for dealing with the alignment tax.[1][2] One approach seeks to find ways to pay the tax, such as persuading individual actors to pay it or facilitating coordination of the sort that would allow groups to pay it. The other approach tries to reduce the tax, by differentially advancing existing alignable algorithms or by making existing algorithms more alignable.
Further reading
Askell, Amanda et al. (2021) A general language assistant as a laboratory for alignment, arXiv:2112.00861 [Cs].
Xu, Mark & Carl Shulman (2021) Rogue AGI embodies valuable intellectual property, LessWrong, June 3.
Yudkowsky, Eliezer (2017) Aligning an AGI adds significant development time, Arbital, February 22.
Copied over from corresponding tag on the EA Forum.