I’ve been thinking about alignment of subsystems in a very similar style and am really excited to see someone else thinking along this way. I started a comment with my own thoughts on this approach; but it got out of hand quickly; so I made a separate post: https://www.lesswrong.com/posts/AZfq4jLjqsrt5fjGz/formalizing-alignment
I’ve been thinking about alignment of subsystems in a very similar style and am really excited to see someone else thinking along this way. I started a comment with my own thoughts on this approach; but it got out of hand quickly; so I made a separate post: https://www.lesswrong.com/posts/AZfq4jLjqsrt5fjGz/formalizing-alignment
Would be keen on having any sort of feedback.