Transcript and Brief Response to Twitter Conversation between Yann LeCunn and Eliezer Yudkowsky

Link post

Yann LeCun is Chief AI Scientist at Meta.

This week, Yann engaged with Eliezer Yudkowsky on Twitter, doubling down on Yann’s position that it is dangerously irresponsible to talk about smarter-than-human AI as an existential threat to humanity.

I haven’t seen anyone else preserve and format the transcript of that discussion, so I am doing that here, then I offer brief commentary.

IPFConline: Top Meta Scientist Yann LeCun Quietly Plotting “Autonomous” #AI Models This is as cool as is it is frightening. (Provides link)

Yann LeCunn: Describing my vision for AI as a “quiet plot” is funny, given that I have published a 60 page paper on it with numerous talks, posts, tweets… The “frightening” part is simply wrong, since the architecture I propose is a way to guarantee that AI systems be steerable and aligned.

Eliezer Yudkowsky: A quick skim of [Yann LeCun’s 60 page paper] showed nothing about alignment. “Alignment” has no hits. On a quick read the architecture doesn’t imply anything obvious about averting instrumental deception, nor SGD finding internal preferences with optima that don’t generalize OOD, etc.

Yann LeCun: To *guarantee* that a system satisfies objectives, you make it optimize those objectives at run time (what I propose). That solves the problem of aligning behavior to objectives. Then you need to align objectives with human values. But that’s not as hard as you make it to be.

EY: Sufficiently intelligent systems, whatever their internal objectives, will do well at optimizing their outer behavior for those. This was never in doubt, at least for me. The entire alignment problem is about aligning internal AI objectives with external human preferences.

Yann: Setting objectives for super-intelligent entities is something humanity has been familiar with since people started associating into groups and laws were made to align their behavior to the common good. Today, it’s called corporate law.

EY: So you’re staking the life of everyone on Earth that:

– Future AIs are as human-friendly on average as the humans making up corporations.

– AIs don’t collude among themselves better than human corporations.

– AIs never go beyond superhuman to supercorporate.

Yann: I’m certainly not staking anyone’s life on anything.

Thankfully, I don’t have that power.

But your idea that getting objective alignment slightly wrong once leads to human extinction (or even significant harm) is just plain wrong.

It’s also dangerous.

Think about consequences.

EY: My objection is not that you’re staking everyone’s life on what you believe – to advocate for a global AI stop is also doing that – but that you are staking everyone’s life on propositions that seem not just uncertain but probably false, and not facing up to that staking. If you think there’s no possible extinction danger from superintelligence no matter how casually the problem is treated or how much you screw up, because of a belief “AIs are no more capable than corporations”, state that premise clearly and that it must bear the weight of Earth.

YL: Stop it, Eliezer. Your scaremongering is already hurting some people. You’ll be sorry if it starts getting people killed.

EY: If you’re pushing AI along a path that continues past human and to superhuman intelligence, it’s just silly to claim that you’re not risking anyone’s life. And sillier yet to claim there are no debate-worthy assumptions underlying the claim that you’re not risking anyone’s life.

YL: You know, you can’t just go around using ridiculous arguments to accuse people of anticipated genocide and hoping there will be no consequence that you will regret. It’s dangerous. People become clinically depressed reading your crap. Others may become violent.

J Fern: When asked on Lex’s podcast to give advice to high school students, Elezier’s response was “don’t expect to live long.” The most inhuman response form someone championing themselves as a hero of humanity. He doesn’t get anything other than his own blend of rationality.

Yann LeCunn: A high-school student actually wrote to me saying that he got into a deep depression after reading prophecies of AI-fueled apocalypse.

Eliezer Yudkowsky: As much as ‘tu quoque’ would be a valid reply, if I claimed that your conduct could ever justify or mitigate my conduct, I can’t help but observe that you’re a lead employee at… Facebook. You’ve contributed to a *lot* of cases of teenage depression, if that’s considered an issue of knockdown importance.

Of course that can’t justify anything I do. So my actual substantive reply is that if there’s an asteroid falling, “Talking about that falling asteroid will depress high-school students” isn’t a good reason not to talk about the asteroid or even – on my own morality – to *hide the truth* from high-school students.

The crux is, again, whether or not, as a simple question of simple fact, our present course leads to a superhuman AI killing everybody. And with respect to that factual question, observing that “hearing about this issue has depressed some high-school students” is not a valid argument from a probabilistic standpoint; “the state of argument over whether AI will kill everyone is deeply depressing” is not *less* likely, in worlds where AI kills everyone, than in worlds where AI is not.

What’s more likely in worlds where AI doesn’t kill everyone? Somebody having an actual plan for that which stands up to five minutes of criticism.

Yann LeCunn: Scaremongering about an asteroid that doesn’t actually exist (even if you think it does) is going to depress people for no reason.

Running a free service that 3 billion people find useful to connect with each other is a Good Thing, even if there are negative side effects that must be mitigated and which are being mitigated [note: I don’t work on this at Meta].

It’s like cars: the solution to reducing accidents is to make cars and roads safer, not banning cars nor just deciding that car manufacturers are evil. Solutions can never be perfect, but they put us in a better place on the risk-benefit trade-off curve.

The risk-benefit trade-off curves for AI are no different, contrary to your mistaken assumption that even the smallest mistake would spells doom on humanity as a whole.

Eliezer Yudkowsky: It seems then, Yann LeCun, that you concede that the point is not whether the end of the world is depressing anyone, the point is just:

*Is* the world ending?

You’ve stated that, although superintelligence is on the way – you apparently concede this point? – corporate law shows that we can all collectively manage superhuman entities, no worries. It’s such a knockdown argument, on your apparent view, that there’s “no reason” to depress any teenagers with worries about it.

I, for one, think it’s strange to consider human corporations as a sort of thing that allegedly scales up to the point of being really, actually, *way smarter* than humans: way more charismatic, way more strategic, with a far deeper understanding of people, making faster progress on basic science questions with lower sample complexity, far less susceptible to invalid arguments, much less likely to make stupid damn mistakes than any human being; and all the other hallmarks that I’d expect to go with truly superhuman cognitive excellence.

Sure, all of the Meta employees with spears could militarily overrun a lone defender with one spear. When it comes to scaling more cognitive tasks, Kasparov won the game of Kasparov vs. The World, where a massively parallel group of humans led by four grandmasters tried and failed to play chess good enough for Kasparov. Humans really scale very poorly, IMO. It’s not clear to me that, say, all Meta employees put together, collectively exceed John von Neumann for brilliance in every dimension.

I similarly expect AIs past a certain point of superhumanity to be much better-coordinated than humans; see for example the paper “Robust Cooperation in the Prisoner’s Dilemma: Program Equilibrium via Provability Logic” (https://​​arxiv.org/​​abs/​​1401.5577). If sufficiently smart minds in general, and AIs with legible source code in particular, can achieve vastly better outcomes on coordination problems via prediction of each others’ decision processes (eg: you can predict I’ll cooperate on the Prisoner’s Dilemma iff I predict you’ll cooperate), then a world full of superhuman AGIs is one where humanity should worry that AIs will all cooperate with each other, and not with us, because we cannot exhibit to one another our code, or build an agreed-upon cognitive agent to arbitrate between us.

I don’t think corporate law is robust to a world where all the AI corporations act together in perfect unison with each other, but not with dumber humans who are naturally unable to participate in their coordination process.

Heck, just an AI thinking 10,000 times faster than a human (one day per nine seconds or so) makes it kinda hard to see how human regulators would stay in control using the same sort of tactics that human regulators use to stay in control of human corporations.

I finally worry that, in real life, the existence of aggregate human groups, has worked out as well for humanity as it has, because humans do care somewhat for other humans, or are relatively easy to train into doing that. I then expect it to be much harder to get an AI to care the same way, on anything remotely resembling the current DL paradigm – including search-based optimization for an objective, if that objective is itself being trained via DL.

Why? Because, for example, hominid evolution falsifies any purported general law along the lines of “hill-climbing optimization for a loss function, to the point where that produces general intelligence, produces robust generalization of the intuitive ‘meaning’ of the loss function even as the system optimized becomes more intelligent”. Humans were optimized purely for inclusive genetic fitness, and we ended up with no built-in internal psychological concept of what that *is*. When we got smarter, smart enough that condoms were a new option that didn’t exist in the ancestral environment /​ training distribution, we started using condoms. Gradient descent isn’t natural selection, but this case example in point does falsify that there’s any law of nature saying in general, “When you do hill-climbing optimization to the point of spitting out an intelligent mind, it ends up aligned with the loss function you tried to train it on, in a way that generalizes well outside the training distribution as the system becomes smarter.”

It seems that *any* of the possibilities:

1. Superhuman AGIs are less human-friendly than ‘superhuman’ corporations composed of humans, and end up not wanting to do the equivalent of tipping at restaurants they’ll never visit again;

2. Superhuman AGIs can operate on a much faster timescale than human corporations and human regulators;

3. Superhuman AGIs will coordinate with each other far better than human corporations have ever managed to conspire; or

4. Superhuman AGIs end up qualitatively truly *smarter* than humans, in a way that makes utter mock of the analogy to human corporations…

…if they are reasonably in play, would each individually suffice to mean that a teenager should not stop worrying, upon being told “Don’t worry, we’ll regulate superintelligences just like we regulate corporations.”

So can you explain again why the teenager should be unworried about trends in AI intelligence heading toward “human-level” general intelligence, and then blowing right through, and then continuing on further?

Can you definitely shoot down *even one* of those concerns 1-4, that seem like individually sufficient reasons to worry? I think all of 1-4 are actually in reality true, not just maybes. If you can shoot down any one of them, to the satisfaction of others concerned with these topics, it would advance the state of debate in this field. Pick whichever one seems weakest to you; start there.

If Yann responds further, I will update this post.

Yann’s Position Here on Corporate Law is Obvious Nonsense

Yann seems to literally be saying ‘we can create superintelligent beings and it will pose zero risk to humanity because we have corporate law. Saying otherwise is dangerous and wrong and needs to stop.’

This is Obvious Nonsense. Yann’s whole line here is Obvious Nonsense. You can agree or disagree with a variety of things Eliezer says here. You can think other points are more relevant or important as potential responses, or think he’s being an arrogant jerk or what not. Certainly you can think Eliezer is overconfident in his position, perhaps even unreasonably so, and you can hold out hope, even quite a lot of hope, that things will turn out well for various reasons.

And yet you can and must still recognize that Yann is acting in bad faith and failing to make valid arguments.

One can even put some amount of hope in the idea of some form of rule of law being an equilibrium that holds for AIs. Surely no person would think this is such a strong solution that we have nothing to worry about.

The very idea that we are going to create entities more capable than humans, of unknown composition, and this is nothing to worry about is patently absurd. That this is nothing to worry about because corporate law is doubly so.

It’s also worth noting, as Daniel Eth points out, that our first encounters with large scale corporations under corporate law… did not go so great? One of them, The East India Company, kind of got armies and took over a large percentage of the planet’s population?

Daniel Eth: I’m actually concerned that a much more competent version of the East India Company would have created a permanent dystopia for all of humanity, so I’m not persuaded by “we’ve governed corporations fine without running into X-risk, so there’s no risk from superintelligent AI.”

Geoffrey Miller: Interesting thought experiment. But, IMHO, the only ‘permanent dystopia’ would be human extinction. Anything else we can escape, eventually, one way or another.

Daniel Eth: “AGI turns the universe into data centers for computing pi, except for the Milky Way, which it turns into a zoo of humans spending 18 hours a day in brutal work, satisfying the slight amount it cares about preserving ‘the poetic human struggle’” seems like a permanent dystopia.

Here is it worth noting that Geoffrey Miller’s claim that ‘anything else we can escape, eventually, one way or another’ is similarly Obvious Nonsense. Why would one think that humans could ‘escape’ from their AI overlords, over any time frame, no matter the scenario and its other details, if humans were to lose control of the future but still survive?

Because a human is writing the plot? Because the AIs would inevitably start a human rights campaign? Because they’d ignore us scrappy humans thinking we were no threat until we found the time to strike?

In real life, once you lose control of the future to things smarter than you are that don’t want you flourishing, you do not get that control back. You do not escape.

It’s crazy where people will find ways to pretend to have hope.

That does not mean that there is no hope. There is hope!

Perhaps everything will turn out great, if we work to make that happen.

There is gigantic upside, if developments are handled sufficiently well.

If that happens, it will not be because we pushed forward thinking there were no problems to be solved, or with no plans to solve them. It will be because (for some value of ‘we’) we realized these problems were super hard and these dangers were super deadly, and some collective we rose to the challenge and won the day.

In terms of the important question of how epistemics works, Yann seems to literally claim that the way to avoid believing false things is to ‘get your research past your advisor and getting your publications to survive peer review’:

An essential step to becoming a scientist is to learn methods and protocols to avoid deluding yourself into believing false things.

You learn that by doing a PhD and getting your research past your advisor and getting your publications to survive peer review.

Also, in response to Eliezer’s clarification here:

Eliezer Yudkowsky: [Foom is] possible but hardly inevitable. It becomes moderately more likely as people call it absurd and fail to take precautions against it, like checking for sudden drops in the loss function and suspending training. Mostly, though, this is not a necessary postulate of a doom story.

Yann LeCunn: “Believe in the god I just invented. By refusing to believe, you risk spending eternity in hell.”

Conclusion and a Precommitment

I wouldn’t be posting this if Yann LeCunn wasn’t Chief AI Scientist for Meta.

Despite his position, I believe this conversation is sufficient documentation of his positions, such that further coverage would not be productive. So unless circumstances change substantially – either Yann changes his views substantially, or Yann’s actions become far more important – I commit to not covering him further.