Update: I started reading alignment forum and like, why are all the posts on sandbagging talking about hiding capabilities? The AI model doesn’t need to hide its capabilities, it just needs to preserve its goals. That’s the long-term game.
Update: I started reading alignment forum and like, why are all the posts on sandbagging talking about hiding capabilities? The AI model doesn’t need to hide its capabilities, it just needs to preserve its goals. That’s the long-term game.