David Schneider-Joseph
dsj
This seems similar to saying that there are holes in Newton’s theory of gravity, therefore choosing to throw out any particular prediction of the theory.
Newton’s theory of gravity applies to high precision in nearly every everyday context on Earth, and when it doesn’t we can prove it, thus we need not worry that we are misapplying it. By contrast, there are routine and substantial deviations from utility maximizing behavior in the everyday life of the only intelligent agents we know of — all intelligent animals and LLMs — and there are other principles, such as deontological rule following or shard-like contextually-activated action patterns, that are more explanatory for certain very common behaviors. Furthermore, we don’t have simple hard and fast rules that let us say with confidence when we can apply one of these models, unlike the case with gravity.
If someone wanted to model human behavior with VNM axioms, I would say let’s first check the context and whether the many known and substantial deviations from VNM’s predictions apply, and if not then we may use them, but cautiously, recognizing that we should take any extreme prediction about human behavior — such as that they’d violate strongly-held deontological principles for tiny (or even large) gains in nominal utility — with a large serving of salt, rather than confidently declaring that this prediction will be definitely right in such a scenario.
it’s very important to note, if it is indeed the case, that the implications for AI are “human extinction”.
Agreed, and noted. But the question here is the appropriate level of confidence with which those implications apply in these cases.
An update on this.
Delta Replaces Engine Units in Effort to Address Toxic-Fume Surge on Planes (gift link):
Delta Air Lines is replacing power units on more than 300 of its Airbus jets in an effort to stem cases in which toxic fumes have leaked into the air supply and led to health and safety risks for passengers and crew.
… The airline is about 90% of its way through the process of upgrading the engines, a type known as the auxiliary power unit, on each of its Airbus A320 family jets, according to a spokesman for Delta. The airline operates 310 of the narrow-body type, including 76 of the latest generation models as of the end of June.
… Delta hasn’t previously disclosed the APU replacement program, which began in 2022.
Replacing the APU, which can become more prone to fume events with age, mitigates some of the risks from toxic leaks but doesn’t address them entirely. Airbus last year found that most cases on the A320 were linked to leaks entering the APU via an air inlet on the aircraft’s belly.
Another separate cause is leaks in the jet engines themselves, which provide most of the bleed-air supply when active.
Again, I’m not talking about minor differences. Children care an awful lot about whether Santa Claus as usually defined exists. This is not small.
In other words, to control AI we need global government powerful enough to suppress any opposition.
That’s the risk at least, yes. (Not sure I agree with all of the specifics which follow in your comment, but I agree with the gist.)
The degree to which his definition is “very different” is not clear.
I disagree. I think it’s clear that hardly any children use this novel definition of Santa Claus. But if you’re right, then it’s imperative to make it clear before employing your own definition which would serve to mislead.
Definitions vary at least slightly from person to person all the time but we don’t make long semantic declarations in normal conversation unless it serves some specific functional purpose.
But this is not a slight difference, it’s a huge and unusual difference in a commonly used term. The functional purpose here is to avoid lying.
Is the child being clear with his friends that by Santa Claus he means something very different than they do? If not then he is lying to his friends.
What purpose is the new definition of the term “Santa Claus” serving here other than to confuse?
Words take on meaning mainly in two ways: either through natural evolution of usage, or deliberate choice of definition. When we do the latter, we should do it to enable clarity, precision, and distinction of concepts, not to muddy the waters between things which are true and things which are false.
We could call anything anything, but the term “Santa Claus” is not normally used by children to refer to “whatever happens to be causing the phenomenon of presents appearing under the tree”. Rather, it’s used by children (and adults who lie to them) to refer to a particular, specific, and false explanation for that phenomenon.
Therefore, if a child asks me if Santa Claus is real, the meaning of that question is not determined by other possible definitions I could theoretically assign to “Santa Claus”. The meaning of the question is determined by the child’s intended meaning, since the child asked the question. And if I say yes, because in my mind I’m imagining this novel definition for the term, then I have lied to that child. There is an unfortunate tradition, almost as old as this lie itself, for inventing rationalizations for the lie in which we adults convince ourselves that it is not really a lie after all, to ease our conscience about it.
Note that the issue is not one of literalism, as in your Aurora Borealis example. Like you say, everyone today knows today that the term Aurora Borealis is not intended to refer to an actual dawn, so there is no lie in using it. But if someone did think it meant that, and I knew this, and they asked me if it exists, then I would be lying to say yes.
The relevant way in which it’s analogous is that a head of state can’t build [dangerous AI / nuclear weapons] without risking war (or sanctions, etc.).
Fair enough, but China and the US are not going to risk war over that unless they believe doom is anywhere close to as certain as Eliezer believes it to be. And they are not going to believe that, in part because that level of certainty is not justified by any argument anyone including Eliezer has provided. And even if I am wrong on the inside view/object level to say that, there is enough disagreement about that claim among AI existential risk researchers that the outside view of a national government is unlikely to fully adopt Eliezer’s outlier viewpoint as its own.
But in return, we now have the tools of authoritarian control implemented within each participating country. And this is even if they don’t use their control over the computing supply to build powerful AI solely for themselves. Just the regime required to enforce such control would entail draconian invasions into the lives of every person and industry.
My preferred mechanism, and I think MIRI’s, would be an international treaty in which every country implements AI restrictions within its own borders. That means a head of state can’t build dangerous AI without risking war. It’s analogous to nuclear non-proliferation treaties.
The control required within each country to enforce such a ban breaks the analogy to nuclear non-proliferation.
Uranium is an input to a general purpose technology (electricity), but it is not a general purpose technology itself, so it is possible to control its enrichment without imposing authoritarian controls on every person and industry in their use of electricity. By contrast, AI chips are themselves a general purpose technology, and exerting the proposed degree of control would entail draconian limits on every person and industry in society.
Slowing down / pausing AI development gives us more time to work on all of those problems. Racing to build ASI means not only are we risking extinction from misalignment, but we’re also facing a high risk of outcomes such as, for example, ASI being developed so quickly that governments don’t have time to get a handle on what’s happening and we end up with Sam Altman as permanent world dictator.
This depends on what mechanism is used to pause. MIRI is proposing, among other things, draconian control over the worldwide compute supply. Whoever has such control has a huge amount of power to leverage over a transformative technology, which seems at least possibly (and to me, very likely) to increase the risk of getting a permanent world dictator, although the dictator in that scenario is perhaps more likely to be a head of state than the head of an AI lab.
Unfortunately, this means that there is no low risk path into the future, so I don’t think the tradeoff is as straightforward as you describe:
The tradeoff isn’t between solving scarcity at a high risk of extinction vs. never getting either of those things. It’s between solving scarcity now at a high risk of extinction, vs. solving scarcity later at a much lower risk.
There’s also the Federmann Center for the Study of Rationality, founded in 1991, where
faculty, students, and guests join forces to explore the rational basis of decision-making. Coming from a broad sweep of departments (mathematics, economics, psychology, biology, education, computer science, philosophy, political science, business, statistics, and law), its members look at how rationality — which, in decision-making, means the process by which individuals, groups, firms, plants, and other entities choose the path of maximum benefit — responds to real-world situations where individuals with different goals interact.
They say they are inspired by the work of John Aumann and Menahem Yaari.
My memory from reading Andrew Hodges’ authoritative biography of Turing is that his theory was designed as a tool to solve the Entscheidungsproblem, which was a pure mathematical problem posed by Hilbert. It just happened to be a convenient formalism for others later on. GPT-5 agrees with me.
This hypothesis was also proposed in 1998 on a different (play money) prediction market and the galaxy-brained trade succeeded for some in 2002.
And mine.
I don’t know much background here so I may be off base, but it’s possible that the motivation of the trust isn’t to bind leadership’s hands to avoid profit-motivated decision making, but rather to free their hands to do so, ensuring that shareholders have no claim against them for such actions, as traditional governance structures might have provided.
(Unless “employees who signed a standard exit agreement” is doing a lot of work — maybe a substantial number of employees technically signed nonstandard agreements.)
Yeah, what about employees who refused to sign? Have we gotten any clarification on their situation?
Thank you, I appreciated this post quite a bit. There’s a paucity of historical information about this conflict which isn’t colored by partisan framing, and you seem to be coming from a place of skeptical, honest inquiry. I’d look forward to reading what you have to say about 1967.
Thanks for doing this! I think a lot of people would be very interested in the debate transcripts if you posted them on GitHub or something.
I’m mostly with you all the way up to and including this line. But I would also add: I have not seen a plausible vision painted for how you avoid a bad future, for any length of time, that does involve some kind of process that is just pretty godlike.
This is why I put myself in the “muddle through” camp. It’s not because I think doing so guarantees a good outcome; indeed I’d be hard-pressed even to say it makes it likely. It’s just that by trying to do more than that — to chart a path through territory that we can’t currently even see — we are likely to make the challenge even harder.
Consider someone in 1712 observing the first industrial steam engines, recognizing the revolutionary potential, and wanting to … make sure it goes well. Perhaps they can anticipate its use outside of coal mines — in mills and ships and trains. But there’s just no way they can envision all of the downstream consequences: electricity, radio and television, aircraft, computing, nuclear weapons, the Internet, Twitter, the effect Twitter will have on American democracy (which by the way doesn’t exist yet…), artificial intelligence, and so on. Any attempt someone would have made, at that time, to design, in detail, a path from the steam engine to a permanently good future would just have been guaranteed at the very least to fail, and probably to make things much worse to the extent they locally succeed in doing anything drastic.
Our position is in many ways more challenging than theirs. We have to be humble about how far into the future we can see. I agree that an open society comes with great danger and it’s hard to see how that goes well in the face of rapid technological change. But so too is it hard to see how centralized power over the future leads to a good outcome, especially if the power centralization begins today, in an era when those who would by default possess that power seem to be … extraordinarily cruel and unenlightened. Just as you, rightly, cannot say if AIs who replace us would have any moral value, I also cannot say that an authoritarian future has any value. Indeed, I cannot even say that its value is not hugely negative.
What I can say, however, is that we have some clear problems directly in front of us, either occurring right now or definitely in sight, one of which is this very possibility of a centralized, authoritarian future, from which we would have no escape. I support muddling through only because I see no alternative.