Just to clarify, with this sentence:
Christianity was also unusual in other potentially key dimensions—it dramatically promoted outbreeding (by outlawing inbreeding far beyond the typical), which plausibly permanently altered the european trajectory.
are you proposing that Christian Europe was historically successful in significant part due to inbreeding less than non-Christian-European civilizations? Is there somewhere I can read more about that thesis? I’m not familiar with it.
Without even getting into whether your specific reward heuristic is misaligned, it seems to me that you’d just shifted the problem slightly out of the focus of your description of the system, by specifying that all of the work will be done by subsystems that you’re just assuming will be safe.
“paperclip quality control” has just as much potential for misalignment in the limit as does paperclip maximization, depending on what kind of agent you use to accomplish it. So, even if we grant the assumption that your heuristic is aligned, we are merely left with the task of designing a bunch of aligned agents to do subtasks.