To me a good definition for this is:
Get to a stage where you can write a computer program which can match the best AI at Go, where the program does no training (or equivalent) and you do no training (or equivalent) in the process of writing the software.
I.E. write a classical computer program that uses the techniques of the Neural Network based program to match it at Go.
This is the most intuitive answer to me, as well. It’s also extremely difficult, and it‘s unclear how it is going to be useful for doing alignment generally.
Perhaps one idea is to train AI to write legible code, then use human code review on it. This seems as safe as our current mode of software development if the AI is not actively obfuscating (a big assumption).