I don’t see why your aligned AI researcher is exempt from joining humans in the “we/us” below.
>”We can gather all sorts of information beforehand from less powerful systems that will not kill us if we screw up operating them; but once we are running more powerful systems, we can no longer update on sufficiently catastrophic errors. This is where practically all of the real lethality comes from, that we have to get things right on the first sufficiently-critical try. … That we have to get a bunch of key stuff right on the first try is where most of the lethality really and ultimately comes from; likewise the fact that no authority is here to tell us a list of what exactly is ‘key’ and will kill us if we get it wrong.”
https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities
It’s neither obvious nor clear to me. Who wrote the rest of their training data, besides us oh-so-fallible humans? What percentage of the data does this non-human authorship constitute?