I think both your points are directionally right: labs engage in risk compensation, and enabling alignment to evil users is pretty bad. These both push towards “alignment research isn’t straightforwardly good for the world.” I’m not sure if I’d take them as far as you do.
I’m pretty skeptical of intent alignment alone. Creating a genius house-elf that will cheerfully do whatever it’s ordered to. Aligning AI to something like “the reflective convergence of a set of values” seems way better, and plausibly not much harder (cf Claude’s constitution). Of course, then we have to consider the environment in which a properly value-aligned AI gets developed: the lab that’s building it, and the societal Powers that have leverage over them. A technique that could align an AI to beautiful values doesn’t help much if the people with guns are demanding their happy house-elf.
My current take is something like...
Some amount of division of labor is necessary. Alignment people aren’t primarily responsible for solving the fucked-up allocation of power in current society.
but, creating AGI is a political act, and AI risk people tend to undervalue integrity and overvalue “accelerating the good guys” and naive act-utilitarianism.
I’m pretty confused by people who persist in thinking alignment is the whole ball game. I wonder if they’re assuming pretty different takeoff dynamics from me (e.g. a very hard takeoff; an AGI that’s able to superpersuade its users to agree with its great value system), and if they’re drawing too much on cached thoughts when they do so.
I wish a lot more people at the labs would consider themselves as political actors in a high-stakes game where we need a lot to go right, and be willing to step outside of their comfortable roles as purely technical people in order to push for other things. I’ve been heartened by things like almost 1,000 Google employees and almost 100 at OAI signing the Not Divided petition.
In my view, the problem is not that some users are evil. The problem is that AI increases power imbalance, and increasing power imbalance creates evil. “Power corrupts”. A future where some entities (AIs or AI-empowered governments or corporations or rich individuals etc) have absolute, root-level power over many people is almost guaranteed to be a dark future. Unless the values of these entities are so locked-in to be good that they’re immune to competitive dynamics and value drift forever—but I don’t think that can be achieved.
I think the only chance of an okay future is if this absolute, root-level power is stopped from existing altogether. That somehow power gets spread out enough that the masses can do “continuous realignment” of the power sitting above them, even when the power doesn’t necessarily want to be realigned. I have no idea how to achieve that, but it’s clear that helping governments and corporations get more power (with alignment work or otherwise) is the worst thing to do from this perspective.
I think both your points are directionally right: labs engage in risk compensation, and enabling alignment to evil users is pretty bad. These both push towards “alignment research isn’t straightforwardly good for the world.” I’m not sure if I’d take them as far as you do.
I’m pretty skeptical of intent alignment alone. Creating a genius house-elf that will cheerfully do whatever it’s ordered to. Aligning AI to something like “the reflective convergence of a set of values” seems way better, and plausibly not much harder (cf Claude’s constitution). Of course, then we have to consider the environment in which a properly value-aligned AI gets developed: the lab that’s building it, and the societal Powers that have leverage over them. A technique that could align an AI to beautiful values doesn’t help much if the people with guns are demanding their happy house-elf.
My current take is something like...
Some amount of division of labor is necessary. Alignment people aren’t primarily responsible for solving the fucked-up allocation of power in current society.
but, creating AGI is a political act, and AI risk people tend to undervalue integrity and overvalue “accelerating the good guys” and naive act-utilitarianism.
I’m pretty confused by people who persist in thinking alignment is the whole ball game. I wonder if they’re assuming pretty different takeoff dynamics from me (e.g. a very hard takeoff; an AGI that’s able to superpersuade its users to agree with its great value system), and if they’re drawing too much on cached thoughts when they do so.
I wish a lot more people at the labs would consider themselves as political actors in a high-stakes game where we need a lot to go right, and be willing to step outside of their comfortable roles as purely technical people in order to push for other things. I’ve been heartened by things like almost 1,000 Google employees and almost 100 at OAI signing the Not Divided petition.
In my view, the problem is not that some users are evil. The problem is that AI increases power imbalance, and increasing power imbalance creates evil. “Power corrupts”. A future where some entities (AIs or AI-empowered governments or corporations or rich individuals etc) have absolute, root-level power over many people is almost guaranteed to be a dark future. Unless the values of these entities are so locked-in to be good that they’re immune to competitive dynamics and value drift forever—but I don’t think that can be achieved.
I think the only chance of an okay future is if this absolute, root-level power is stopped from existing altogether. That somehow power gets spread out enough that the masses can do “continuous realignment” of the power sitting above them, even when the power doesn’t necessarily want to be realigned. I have no idea how to achieve that, but it’s clear that helping governments and corporations get more power (with alignment work or otherwise) is the worst thing to do from this perspective.