Reflective stability is important for alignment, because if we, say, build AI that doesn’t want to kill everyone, we prefer it to create successors and self-modifications that still doesn’t want to kill everyone. It can change itself in whatever ways, necessary thing here is conservation/non-decreasing of alignment properties.
Reflective stability is important for alignment, because if we, say, build AI that doesn’t want to kill everyone, we prefer it to create successors and self-modifications that still doesn’t want to kill everyone. It can change itself in whatever ways, necessary thing here is conservation/non-decreasing of alignment properties.
That makes sense, thanks!