IDA includes looking inside the overseen agent: ‘As described here, we would like to augment this oversight by allowing Bⁿ⁻¹ to view the internal state of Aⁿ.’ (ALBA: An explicit proposal for aligned AI) If we can get enough information out of that internal state, we can avoid inner misalignment. This, however, is difficult and written about in The informed oversight problem.
IDA includes looking inside the overseen agent: ‘As described here, we would like to augment this oversight by allowing Bⁿ⁻¹ to view the internal state of Aⁿ.’ (ALBA: An explicit proposal for aligned AI) If we can get enough information out of that internal state, we can avoid inner misalignment. This, however, is difficult and written about in The informed oversight problem.