I think similar threat models and similar lines of reasoning might also be useful with respect to (potentially misaligned) ~human-level/not-strongly-superhuman AIs, especially since more complex tasks seem to require more intermediate outputs (that can be monitored).
We strongly agree, see our recent work. As I state in the post “I think the style of work I discuss here has good transfer with the AI control approach”. We have a forthcoming post explaining AI control in more detail.
I think similar threat models and similar lines of reasoning might also be useful with respect to (potentially misaligned) ~human-level/not-strongly-superhuman AIs, especially since more complex tasks seem to require more intermediate outputs (that can be monitored).
We strongly agree, see our recent work. As I state in the post “I think the style of work I discuss here has good transfer with the AI control approach”. We have a forthcoming post explaining AI control in more detail.