This is really cool work! I particularly like the taxonomy of self-awareness. Have you seen whether steering awareness extends to introspection about unsteered activations?
This is really cool work! I particularly like the taxonomy of self-awareness. Have you seen whether steering awareness extends to introspection about unsteered activations?