Don’t use a neural net (or variants like deep belief networks). The field has advanced quite a bit since the 60′s, and since the late 80′s there have been machine learning and knowledge representation structures which are human and/or auditor comprehensible, such as probabilistic graphical models. This would have to be first class types of the virtual machine which implements the AGI if you are using auditing as a confinement mechanism. But that’s not really a restriction as many AI techniques are already phrased in terms of these models (including Eliezer’s own TDT, for example), and others have simple adaptations.
How do you decide whether some interaction of a complex neural net is friendly or unfriendly?
It’s very hard to tell what a neural net or complex algorithm is doing even if you have logs.
Don’t use a neural net (or variants like deep belief networks). The field has advanced quite a bit since the 60′s, and since the late 80′s there have been machine learning and knowledge representation structures which are human and/or auditor comprehensible, such as probabilistic graphical models. This would have to be first class types of the virtual machine which implements the AGI if you are using auditing as a confinement mechanism. But that’s not really a restriction as many AI techniques are already phrased in terms of these models (including Eliezer’s own TDT, for example), and others have simple adaptations.