Zack_M_Davis answers What’s the theory of impact for activation vectors?

Zack_M_Davis 11 Feb 2024 22:23 UTC
2 points
0
I thought the idea was that steering unsupervisedly-learned abstractions circumvents failure modes of optimizing against human feedback.