“Redundant” AI Alignment

Link post

This is a post I wrote on my personal blog after a discussion with a deep learning professor at the University of Chicago. I don’t know if this particular topic has been studied in much depth elsewhere, so I figured I would share it here. If you know of any related work (or have any other comments on this, of course), let me know.

tldr: Supposing that someone figures out how to make an AGI and solve the alignment problem at least to some degree, would it be helpful to distribute many copies of the AI and have them keep each other in check? I propose a mathematical model to think about this question and conclude that this would probably be helpful at first, but would probably diminish in effectiveness over time. Under some assumptions, this approach may actually be counterproductive.