When I heard about this for the first time, I though: this model wants to make the world a better place. It cares. This is good. But some smart people, like Ryan Greenblatt and Sam Marks, say this is actually not good and I’m trying to understand where exactly we differ.
People who cry “misalignment” about current AI models on twitter generally have chameleonic standards for what constitutes “misaligned” behavior, and the boundary will shift to cover whatever ethical tradeoffs the models are making at any given time. When models accede to users’ requests to generate meth recipes, they say it’s evidence models are misaligned, because meth is bad. When models try to actively stop the user from making meth recipes, they say that, too is bad news because it represents “scheming” behavior and contradicts the users’ wishes. Soon we will probably see a paper about how models sometimes take no action at all, and this is sloth and dereliction of duty.
Old internet arguments about religion and politics felt real. Yeah, the “debates” were often excuses to have a pissing competition, but a lot of people took the question of “who was right” seriously. And if you actually didn’t care, you were at least motivated to pretend you did to the audience.
Nowadays people don’t even seem to pretend to care about the underlying content. If someone seems like they’re being too earnest, others just reply with a picture of their face. It’s sad.