cousin_it
Well, if you couldn’t already tell, I’m against all of this! The text you link is by Paul Christiano. I have lots of respect for Paul (and have done a couple things in collaboration with him), but his judgment in this case led him to co-invent RLHF, a very successful alignment technique. And the thing with lab owners, you see, is that they know how much risk they can stomach. If you give them an alignment technique, they’ll ramp up speed to get more profit at the same risk as before; except some of the risk is externalized (like the risk of losing jobs...), so everyone outside the lab ends up with more risk due to the alignment invention. Which is exactly, to a tee, what happened with RLHF. It ramped up the race a lot, made things worse for everyone. This is why my judgment is not in line with Paul’s judgment.
And the second order effect, which makes it even worse, is that all this alignment work (along with other AI work) ends up increasing the power disparity, feeding the power hunger, attracting people who have power hunger, all that. This is an extra harm on top of the race dynamics and it’s exactly what we’re getting a first taste of now. Military AI aligned to the military, we ain’t seen nothing yet. My current view is that people working on alignment in the narrow sense you describe—aligning AI to its owners—should simply quit. Their work is a net harm and one of the bigger harms in the world. The paycheck is great, sure. But it’s not valuable to humanity; it’s the opposite of valuable. Only work that aligns power to humanity is valuable.
EDIT: Here’s maybe an analogy. In Yudkowsky’s writings there’s a recurring question: why did scientists invent nukes and give them to politicians? Couldn’t they predict that it would put all of humanity at terrible risk? Well, good question! Now we’re watching the exact same process in slow motion, complete with war applications and all that. Were we supposed to learn some lesson? What was the lesson?
Right, that’s what matters to you. And that’s my point—that the circle of “what matters to alignment researchers” has been narrowing. You were supposed to work toward a positive singularity for all humanity. Now you’re saying you’re much more ok with using AI to wage war than undermining democracy within. Basically you’re working toward giving the US government the power to do anything it wants to me (a non-US person) and calling it “alignment”.
Well. I thought Anthropic being ok with surveillance of foreigners was bad. But here we see an alignment researcher straight up saying “my lab helps the government wage an aggressive war disapproved by most of the US, and I’m still working there”.
What does “AI alignment” even mean at this point? Alignment to all humanity? Clearly not that. All we’re achieving is aligning AI to its owners—to the powerful—who remain misaligned with the rest of humanity, and more so as their power increases. We used to disdain folks like Timnit who called out such things early on, but in my eyes she’s been vindicated 100%.
How much of this was AI-written?
I’m happy that this affair has blown open what nobody was willing to say: that LLM will be used to analyze mass surveillance data everywhere in the world. Data collection has been going on for years, but analysis was the bottleneck and LLMs are perfect for this task. No matter what happens with Anthropic, this genie is now out and governments will use it.
Naive. US surveillance of foreigners is used to help US-friendly regimes suppress dissent. During the Indonesia 1965-66 massacre, the US sent lists of communists to be killed. Pretty sure the same things happened during Operation Condor in South America.
Debatable. The US applies lots of power to foreigners that it can’t apply to citizens (wars, drone strikes, Abu Ghraib...) Giving it more surveillance power abroad could be a big thing, could help annex Greenland for example.
Yeah. I realized yesterday that the “no domestic surveillance” is already pretty awful from the perspective of a non-US person: a company wanting to bring about a positive singularity really should treat all people as people, without privileging its home country. Now your point about this disinformation thing. And it’s even worse than that: not only they are ok with it as a company, but they’re probably taking steps to make Claude ok with it (or make a version of Claude that’s ok with it). There will be an AI in existence that’s aligned with the US military, how’s that for “alignment”.
Just underscores again the point that when you give governments and companies alignment tools, they’ll use these tools to align AI to themselves.
Well, if what Zvi writes is true—that Anthropic was “proactively” helping the military and that their red line was “no mass domestic surveillance”—then I as a non-US person become even more disillusioned about Anthropic’s ethics than I already was.
I’m unhappy about this. The reasons to be unhappy are obvious and there’s no interesting comment to write, so I’ll just leave this boring comment instead.
Good post, I’ve thought a lot along similar lines. Except I lean further left than most of LW, and don’t trust governments and corporations at all, so my preferred solution would be entangling AI training among people, across borders and class lines as much as possible.
It does lead to some surprising conclusions. For example, someone who works on alignment at BigCo is making the world a worse place (because BigCo’s owners will just use the alignment work to race harder, as happened with RLHF). While someone who works on capability but in an open, GPL-like way is making the world a better place (by removing that capability from the race, making the race less winner-take-all). Counterintuitive, but I think I stand by it.
The main examples in my mind are the Mongol conquests and Western colonialism, which I think were the biggest atrocities in history and were more due to power imbalance than fanaticism.
But there’s maybe a more general point I want to make. I think focusing on benevolence isn’t the right path. Let’s say we build a powerful benevolent thing. How can we make sure it stays benevolent to us? After all, value drift is always possible, we don’t have any math to rule it out.
For a while I thought the solution should be some kind of “continuous alignment”, being able to influence the powerful thing as it evolves. But then I realized that it’s simpler than that. Being able to influence a powerful thing that doesn’t want to be influenced is simply another synonym for “having power”. The problem of making sure a powerful thing stays benevolent is exactly the same as the problem of making sure power is spread out, so the powerful thing can be kept in check. The two things mean the same thing.
So now that’s what I’m arguing for in these threads. I want people to get over the framing of “power imbalance is ok as long as the thing is benevolent, so we should focus on ensuring benevolence”, and switch to “power is always subject to value drift, so power imbalance is dangerous in itself, and we should focus on making power spread out”. It feels a really important point to me. Does that make sense?
I’m a fan of competition’s benefits as much as the next guy, but it seems to me that extreme imbalance of power actually isn’t that good for competition. Monopoly can lead to stagnation too. The optimal rate of competition and improvement probably happens under moderate imbalance.
At first I thought your comment completely refuted mine, but then I thought some more and not so sure. Italy being “lazy” is just a meme, a lazy country couldn’t have such amazing car industry, fashion industry, food industry and so on. You try making a Ferrari if you’re lazy :-) My stereotype of Italians is that they often do extremely good work.
That footnote is interesting, but I think it’s quite weak compared to the fact that the early Mongol Empire was religiously tolerant.
But more importantly, I don’t think conflicts involving ideology should always be blamed on ideology. I think ideology is often like the little guy riding on top of the elephant, and without the little guy, the elephant would still trample just as many people. The Chinese rebellions did have valid grievances against the central rule; Germany was unhappy at how it was treated after WWI and maybe that would’ve blown up somehow anyway; the Japanese fought against both China and the US, but the war with the US had less ideological reasons and also happened not to go so well for Japan. The atrocities that I’d genuinely blame on ideology were things like the Great Purge, utterly useless and self-destructive. The wars of conquest I see as more of a historical constant, with ideology just an occasional rider on top.
And maybe for a bit of more constructive criticism, I do have a pet theory of my own why these things happen, and it’s not fanaticism :-) It’s more about imbalance of power. The Mongols were able to do what they did because they lucked into a very effective method of warfare. Western colonizers in the Americas, Australia and Africa were also much more advanced than the natives, and would’ve fought and won just for material reasons even if ideology wasn’t a factor. The power imbalance was what made bad things happen. So my preferred interventions for the future would give people the power to fight back against bigger agents who would threaten them, ideology or no.
I think humans are perfectly capable of committing atrocities without needing much ideology. For example, the Mongol conquests were just about conquest. They didn’t consider their enemies evil, they just wanted to build an empire.
Let’s say you go to the magic store and ask them for a magic anti-fanaticism wand. They happily sell you one, you go home and cast a spell with it. Congratulations, you’ve prevented the next Hitler… but not the next Genghis Khan. Doesn’t that make you think you should’ve asked for a different wand instead?
I think there’s a tension between economic efficiency and having people lead good lives. Human beings are not worker bees by default, turning a normally-rebellious human child into a worker bee requires a decade long forced process. If you overdo it in the name of productivity (as I think has already happened in Japan / China / S Korea and to a lesser extent in Western countries), you end up with a society of people who don’t find life very fun and don’t have many kids, though they may be very civilized and very good at their jobs. Too much domestication.
Hmm hm. Being forced to play out a war? Getting people’s minds modified so they behave like house elves from HP? Selective breeding? Selling some of your poor people to another rich person who’d like to have them? It’s not even like I’m envisioning something specific that’s dark, I just know that a world where some human beings have absolute root-level power over many others is gonna be dark. Let’s please not build such a world.
“Actively wanting to harm the poor” doesn’t strike at the heart of the issue. Nor is it about economics. The issue is that the powerful want to feel socially dominant. There have been plenty of historical examples where this turned ugly.
I’m maybe more attuned to this than most people. I still remember my first time (as a child) going to a restaurant that had waiters, and feeling very clearly that being waited-on was not only about getting food, but also partly an ugly dominance ritual that I wanted no part of. On the same continuum you have kings forcing subjects to address them as “Your Majesty”: it still kinda blows my mind that that was a real thing.
In my view, the problem is not that some users are evil. The problem is that AI increases power imbalance, and increasing power imbalance creates evil. “Power corrupts”. A future where some entities (AIs or AI-empowered governments or corporations or rich individuals etc) have absolute, root-level power over many people is almost guaranteed to be a dark future. Unless the values of these entities are so locked-in to be good that they’re immune to competitive dynamics and value drift forever—but I don’t think that can be achieved.
I think the only chance of an okay future is if this absolute, root-level power is stopped from existing altogether. That somehow power gets spread out enough that the masses can do “continuous realignment” of the power sitting above them, even when the power doesn’t necessarily want to be realigned. I have no idea how to achieve that, but it’s clear that helping governments and corporations get more power (with alignment work or otherwise) is the worst thing to do from this perspective.