And most of history looks suffused with ruthless sociopathy to my eye.
This is the part that always confuses me about “alignment”, and it boils down to “aligned to who?”
The AI lab CEOs? I wouldn’t trust most of them with control of a superhuman intelligence. On his best day, Dario Amodei looks like the protagonist of a Greek tragedy about to be destroyed by the gods. And I wouldn’t trust Sam Altman with my lunch money.
The government? I’m sure the government can be trusted with control of a superhuman intelligence. /s
The voting public? I’m sure this is really fun, unless you’re trans, or an immigrant, or belong the Out Group. In which case, an AI aligned to the popular vote means you’re going to get “cured” of whatever society doesn’t like this year.
Now, I happen to personally believe that alignement of superintelligent, learning, goal-seeking entities is impossible. Not “difficult” or “it might take decades”, but flat out impossible. An AI might like humans enough to keep us as pets, but that will be the AI’s decision, not ours. Dogs have approximately no control over their relationship with humans, and I figure that “humans as house pets” is the absolute best possible result of building superintelligent AI. My P(not doom|someone builds superintelligence) is about 1⁄6, and nearly all of that 1⁄6 is placed on “humans as house pets.”
But if we could control AIs? Those AIs would be controlled by powerful humans, the same sorts of people who had warm personal relationships with Epstein, and who had zero problems with Epstein trafficking and raping children. Given a choice between “superhuman AIs aligned to the Epstein class” and getting paperclipped, I’d go with the paperclips.
The only winning move is not to build superintelligence.
In short, I have a view of human nature that’s somewhat more optimistic than yours.
I don’t think leaving humans in charge of the world is obviously a win either. It does look to me like the arc of history is bending toward justive, but it’s happening slowly and in fits and starts. And we could all be dead before we get to a stable just society. This isn’t really an argument for building ASI; I think we probably shouldn’t, or at least not this fast.
But it looks like we’re going to.
The big advantage of building intent-aligned AGI (if we can avoid do that instead of build a misaligned ASI that kills us all) is that it makes being good to people vastly easier, essentially completely free. You just tell your ASI “okay fine, go make the world better for people. Tell me how you’d do it and I’ll choose some options”.
This lowers the bar for how good someone has to be to benefit the humanity to just above zero. If they have more inclination to be helpful than harmful, that’s all it takes.
No human who’s ever lived has been in that position. Even the most powerful have had to worry about losing their power, and about themselves and their loved ones dying painfully and fairly soon.
So strangely, I wouldn’t trust Sam Altman with my lunch money, but I would guess he’d probably produce a very good future if he were to wind up god-emperor for eternity. The exposes I’ve seen don’t claim he’s a particularly vengeful person. We’ll just have to celebrate Samday every week :)
There are individuals with what I think of as a negative empathy-sadism balance, but they’re pretty rare. Sociopathic individuals do seem to be overrepresented in the halls of power, but even there I think we’ve got pretty good odds of minimally good people winding up in charge of ASI.
This is not a scenario I’m comfortable with If a sadistic individual gets control of the future, it could be worse than death, a permanent state of suffering. But it would take both a very selfless and competent person to launch such a thing successfully. . I’d almost rather see an attempt at value-aligned AGI.
I’m not sure how to up our odds; getting good people into power is an old challenge and I don’t know of new methods to improve our odds.
What do you think of the Meaning Alignment Institute’s (MAI) “democratic fine-tuning (DFT)” work on eliciting moral graphs from populations? e.g. this post from Oct ’23 (primer here):
We report on the first run of “Democratic Fine-Tuning” (DFT), funded by OpenAI. DFT is a democratic process that surfaces the “wisest” moral intuitions of a large population, compiled into a structure we call the “moral graph”, which can be used for LLM alignment.
We show bridging effects of our new democratic process. 500 participants were sampled to represent the US population. We focused on divisive topics, like how and if an LLM chatbot should respond in situations like when a user requests abortion advice. We found that Republicans and Democrats come to agreement on values it should use to respond, despite having different views about abortion itself.
We present the first moral graph, generated by this sample of Americans, capturing agreement on LLM values despite diverse backgrounds.
We present good news about their experience: 71% of participants said the process clarified their thinking, and 75% gained substantial respect for those across the political divide.
Finally, we’ll say why moral graphs are better targets for alignment than constitutions or simple rules like HHH. We’ll suggest advantages of moral graphs in safety, scalability, oversight, interpretability, moral depth, and robustness to conflict and manipulation.
and their more recent full-stack alignment vision? I ask because I’ve asked myself the same exact question, and MAI’s actual DFT above getting Reps and Dems to agree on hot-button questions seemed like the only line of work getting concrete results.
That said, I do lean towards your “the only winning move is not to build superintelligence” take, I suspect because I was born and raised in a country that until a few decades ago was a British colony, so I am biased to view your threat model description as obviously correct. So I’m guessing your answer to my question above is “who cares what MAI is working on, aligning ASI is impossible”?
What do you think of the Meaning Alignment Institute’s (MAI) “democratic fine-tuning (DFT)” work on eliciting moral graphs from populations?
Interesting! I will need to read through this in more detail, to get an idea of their approach. I’m glad someone is trying to do something in this space.
My objection to other approaches of democratic governance tend to break down roughly as follows:
I fear that democratic governance of superintelligence about as likely to succeed as chimpanzees coming up with elaborate schemes to democratically manage Homo sapiens for the benefit of chimps. No matter how careful and clever the chimps are, they’re going to fail. They don’t even understand 99% of what’s going on, so how could they hope to manage it?
We will not, in practice, actually attempt any such governance scheme. The Chinese labs won’t, because China doesn’t even believe in Western notions of democracy and human rights. OpenAI has recently gutted its existing non-profit governance structure in order the reduce the risk of anyone attempting to govern it. Anthropic, out of all the labs, just might try. But the US government is currently trying to break Anthropic and bring them to heel by threatening to designate them as a supply chain risk (like Huawei) unless they agree to support “all legal uses,” potentially including things like fully autonomous killbots and domestic surveillance. The “supply chain risk” designation, as I understand it, would mean that no Anthropic customer would be allowed to do business with the US government. Perhaps I’ve misunderstood this specific situation, but in the end, Anthropic is subject to the people with the guns. And the people with the guns do not necessarily want democratic oversight. So in practice, no, the billionaires and politicians will almost certainly not agree to some clever democratic governance system.
Even if we could somehow control superintelligence and if we could somehow place it under democratic control, I don’t especially trust democratic control. Why? Well, I’m bi, my friends are trans, and I’m old enough to remember the 1980s. Had someone proposed a plan like, “LGBT+ people are mentally ill, and we can cure them by nonconsensually rewriting their minds,” it’s entirely possible that the public might have voted for that.
Finally, democracy is inherently unstable. About 20-25% of people appear to be “authoritarian followers”, which means they’re pretty happy to vote for a strongman. This number increases in times of fear and crisis. (It went up after 9/11, for example.) And another big chunk of the population can be moved by propaganda, or barely understand anything at all about politics. So historically, a number of 20th century democratic nations voted in the leaders who destroyed their democracy. This can be fixed; Germany is a democracy again today. But I expect democratic governance of superintelligence would be subject to similar risks, and in the case of superintelligence, you may not be able to fix your mistakes.
So a plan like MAI’s is crtically dependant on a number of assumptions:
We can control superintelligence.
We have sufficiently good democratic control over the rich and the powerful to make sure they don’t wind up controlling superintelligence.
If the people do succeed in getting democratic control over superintelligence, they won’t vote it away, and they won’t democratically decide to horrible things to unpopular minorities.
So from my perspective, MAI’s plan is a “hail Mary” plan. But we’re pretty deep in “hail Mary” territory, so I’m not opposed to placing bets on what look like unlikely outcomes.
Similarly, as far as I can tell, Dario Amodei’s current plan for Anthropic is “build superintelligence as fast as we can, do our very best to make it like humans, and expect to totally lose all human control within 5-20 years.” Personally, I feel like this is the least horrible version of the worst idea in human history. Like, obviously, no, we should not do this. But if we’re going to do this, Anthropic is at least thinking about the real issues. They know that humans are likely to lose control, but they’re basically hoping we can wind up as beloved house pets.
I still think the best plan is “just don’t build something vastly smarter than us with the ability to learn,[1] pursue goals and replicate.” One obvious objection to my plan is that we’re probably going to go right ahead and build superintelligence anyway. Which is why I am sympathetic to long-shot plans that might have an outside chance of working.
But I still prefer “just don’t build superintelligence.” Or, failing that, delay it. Emotionally, I’m treating it sort of like a diagnosis of terminal cancer for me and everyone I love. Even a remission of several years would be of immense value. And delay also gives some of the hail Mary plans a slightly better chance of working, or of the public realizing that maybe they don’t want to be “beloved house pets” of minds no human can possibly understand.
Learning is essentially a form of self-modification. Combined with differential replication of more successful entities, this gives you natural selection.
Yeah. The true nature of power is shown by more horrors of history than can be counted, but Epstein and factory farming are especially illustrative to me.
This is the part that always confuses me about “alignment”, and it boils down to “aligned to who?”
The AI lab CEOs? I wouldn’t trust most of them with control of a superhuman intelligence. On his best day, Dario Amodei looks like the protagonist of a Greek tragedy about to be destroyed by the gods. And I wouldn’t trust Sam Altman with my lunch money.
The government? I’m sure the government can be trusted with control of a superhuman intelligence. /s
The voting public? I’m sure this is really fun, unless you’re trans, or an immigrant, or belong the Out Group. In which case, an AI aligned to the popular vote means you’re going to get “cured” of whatever society doesn’t like this year.
Now, I happen to personally believe that alignement of superintelligent, learning, goal-seeking entities is impossible. Not “difficult” or “it might take decades”, but flat out impossible. An AI might like humans enough to keep us as pets, but that will be the AI’s decision, not ours. Dogs have approximately no control over their relationship with humans, and I figure that “humans as house pets” is the absolute best possible result of building superintelligent AI. My P(not doom|someone builds superintelligence) is about 1⁄6, and nearly all of that 1⁄6 is placed on “humans as house pets.”
But if we could control AIs? Those AIs would be controlled by powerful humans, the same sorts of people who had warm personal relationships with Epstein, and who had zero problems with Epstein trafficking and raping children. Given a choice between “superhuman AIs aligned to the Epstein class” and getting paperclipped, I’d go with the paperclips.
The only winning move is not to build superintelligence.
These are serious questions.
In short, I have a view of human nature that’s somewhat more optimistic than yours.
I don’t think leaving humans in charge of the world is obviously a win either. It does look to me like the arc of history is bending toward justive, but it’s happening slowly and in fits and starts. And we could all be dead before we get to a stable just society. This isn’t really an argument for building ASI; I think we probably shouldn’t, or at least not this fast.
But it looks like we’re going to.
The big advantage of building intent-aligned AGI (if we can avoid do that instead of build a misaligned ASI that kills us all) is that it makes being good to people vastly easier, essentially completely free. You just tell your ASI “okay fine, go make the world better for people. Tell me how you’d do it and I’ll choose some options”.
This lowers the bar for how good someone has to be to benefit the humanity to just above zero. If they have more inclination to be helpful than harmful, that’s all it takes.
No human who’s ever lived has been in that position. Even the most powerful have had to worry about losing their power, and about themselves and their loved ones dying painfully and fairly soon.
So strangely, I wouldn’t trust Sam Altman with my lunch money, but I would guess he’d probably produce a very good future if he were to wind up god-emperor for eternity. The exposes I’ve seen don’t claim he’s a particularly vengeful person. We’ll just have to celebrate Samday every week :)
There are individuals with what I think of as a negative empathy-sadism balance, but they’re pretty rare. Sociopathic individuals do seem to be overrepresented in the halls of power, but even there I think we’ve got pretty good odds of minimally good people winding up in charge of ASI.
This is not a scenario I’m comfortable with If a sadistic individual gets control of the future, it could be worse than death, a permanent state of suffering. But it would take both a very selfless and competent person to launch such a thing successfully. . I’d almost rather see an attempt at value-aligned AGI.
I’m not sure how to up our odds; getting good people into power is an old challenge and I don’t know of new methods to improve our odds.
What do you think of the Meaning Alignment Institute’s (MAI) “democratic fine-tuning (DFT)” work on eliciting moral graphs from populations? e.g. this post from Oct ’23 (primer here):
and their more recent full-stack alignment vision? I ask because I’ve asked myself the same exact question, and MAI’s actual DFT above getting Reps and Dems to agree on hot-button questions seemed like the only line of work getting concrete results.
That said, I do lean towards your “the only winning move is not to build superintelligence” take, I suspect because I was born and raised in a country that until a few decades ago was a British colony, so I am biased to view your threat model description as obviously correct. So I’m guessing your answer to my question above is “who cares what MAI is working on, aligning ASI is impossible”?
Interesting! I will need to read through this in more detail, to get an idea of their approach. I’m glad someone is trying to do something in this space.
My objection to other approaches of democratic governance tend to break down roughly as follows:
I fear that democratic governance of superintelligence about as likely to succeed as chimpanzees coming up with elaborate schemes to democratically manage Homo sapiens for the benefit of chimps. No matter how careful and clever the chimps are, they’re going to fail. They don’t even understand 99% of what’s going on, so how could they hope to manage it?
We will not, in practice, actually attempt any such governance scheme. The Chinese labs won’t, because China doesn’t even believe in Western notions of democracy and human rights. OpenAI has recently gutted its existing non-profit governance structure in order the reduce the risk of anyone attempting to govern it. Anthropic, out of all the labs, just might try. But the US government is currently trying to break Anthropic and bring them to heel by threatening to designate them as a supply chain risk (like Huawei) unless they agree to support “all legal uses,” potentially including things like fully autonomous killbots and domestic surveillance. The “supply chain risk” designation, as I understand it, would mean that no Anthropic customer would be allowed to do business with the US government. Perhaps I’ve misunderstood this specific situation, but in the end, Anthropic is subject to the people with the guns. And the people with the guns do not necessarily want democratic oversight. So in practice, no, the billionaires and politicians will almost certainly not agree to some clever democratic governance system.
Even if we could somehow control superintelligence and if we could somehow place it under democratic control, I don’t especially trust democratic control. Why? Well, I’m bi, my friends are trans, and I’m old enough to remember the 1980s. Had someone proposed a plan like, “LGBT+ people are mentally ill, and we can cure them by nonconsensually rewriting their minds,” it’s entirely possible that the public might have voted for that.
Finally, democracy is inherently unstable. About 20-25% of people appear to be “authoritarian followers”, which means they’re pretty happy to vote for a strongman. This number increases in times of fear and crisis. (It went up after 9/11, for example.) And another big chunk of the population can be moved by propaganda, or barely understand anything at all about politics. So historically, a number of 20th century democratic nations voted in the leaders who destroyed their democracy. This can be fixed; Germany is a democracy again today. But I expect democratic governance of superintelligence would be subject to similar risks, and in the case of superintelligence, you may not be able to fix your mistakes.
So a plan like MAI’s is crtically dependant on a number of assumptions:
We can control superintelligence.
We have sufficiently good democratic control over the rich and the powerful to make sure they don’t wind up controlling superintelligence.
If the people do succeed in getting democratic control over superintelligence, they won’t vote it away, and they won’t democratically decide to horrible things to unpopular minorities.
So from my perspective, MAI’s plan is a “hail Mary” plan. But we’re pretty deep in “hail Mary” territory, so I’m not opposed to placing bets on what look like unlikely outcomes.
Similarly, as far as I can tell, Dario Amodei’s current plan for Anthropic is “build superintelligence as fast as we can, do our very best to make it like humans, and expect to totally lose all human control within 5-20 years.” Personally, I feel like this is the least horrible version of the worst idea in human history. Like, obviously, no, we should not do this. But if we’re going to do this, Anthropic is at least thinking about the real issues. They know that humans are likely to lose control, but they’re basically hoping we can wind up as beloved house pets.
I still think the best plan is “just don’t build something vastly smarter than us with the ability to learn, [1] pursue goals and replicate.” One obvious objection to my plan is that we’re probably going to go right ahead and build superintelligence anyway. Which is why I am sympathetic to long-shot plans that might have an outside chance of working.
But I still prefer “just don’t build superintelligence.” Or, failing that, delay it. Emotionally, I’m treating it sort of like a diagnosis of terminal cancer for me and everyone I love. Even a remission of several years would be of immense value. And delay also gives some of the hail Mary plans a slightly better chance of working, or of the public realizing that maybe they don’t want to be “beloved house pets” of minds no human can possibly understand.
Learning is essentially a form of self-modification. Combined with differential replication of more successful entities, this gives you natural selection.
Yeah. The true nature of power is shown by more horrors of history than can be counted, but Epstein and factory farming are especially illustrative to me.