I disagree with this post. At the very least, I feel like there should be some kind of caveat or limit regarding the size of the organization or distance that one has from the organization. For example, if I’m writing a post or comment about some poor experience I had with Amazon, do I have a moral obligation to run that post by Amazon’s PR beforehand? No. Amazon is a huge company, and I’m not really connected to them in any way, so I do not and should not feel any obligation towards them prior to sharing my experiences with their products or services.
quanticle
An archived, un-paywalled version of the article is available here.
My point is this: we should focus first on limiting the most potent vectors of attack: those which involve conventional ‘weapons’.
That’s exactly where I disagree. Conventional weapons aren’t all that potent compared to social, economic, or environmental changes.
Does comparing neurons firing with beliefs/desires involve a type distinction that renders belief/desires irreducible to patterns of neuron activity?
I don’t think it does, but I do think that the difference in scale between a neuron firing and an entire belief forming makes the reduction very difficult, and possibly pointless. It’s a bit like reducing the spray of water out of a garden hose to the movement of individual water molecules. It’s very difficult, given that each water molecule’s motion contributes only an infinitesimal amount to the movement of the water as whole. Furthermore, even if you could map particular interactions between water molecules to specific motions of water droplets or the water stream as a whole, would you learn anything new thereby? Would it help solve any of the problems you’re interested in? A lot of the time, it’s better to work at higher levels of abstraction.
Let’s define a weapon as any tool which could be used to mount an attack.
Why? That broadens the definition of “weapon” to mean literally any tool, technology, or tactic by which one person or organization can gain an advantage over another. It’s far broader than and connotationally very different from the implied definition of “weapon” given by “building intelligent machines that are designed to kill people” and the examples of “suicide drones”, “assassin drones” and “robot dogs with mounted guns”.
Redefining “weapon” in this way turns your argument into a motte-and-bailey, where you’re redefining a word that connotes direct physical harm (e.g. robots armed with guns, bombs, knives, etc) to mean any machine that can, on its own, gain some kind of resource advantage over humans. Most people would not, for example, consider a superior stock-trading algorithm to be a “weapon”, but your (re)definition, it would be.
However, weapons provide the most dangerous vector of attack for a rogue, confused, or otherwise misanthropic AI.
I’m not sure why you think that. Human weapons, as horrific as they are, can only cause localized tragedies. Even if we gave the AI access to all of our nuclear weapons, and it fired them all, humanity would not be wiped out. Millions (possibly billions) would perish. Civilization would likely collapse or be set back by centuries. But human extinction? No. We’re tougher than that.
But an AI that competes with humanity, in the same way that Homo sapiens competed with Homo neanderthalis? That could wipe out humanity. We wipe out other species all the time, and only in a small minority of cases is it because we’ve turned our weapons on them and hunted them into extinction. It’s far more common for species to go extinct because humanity needed the habitat and other natural resources that that species needed to survive, and outcompeted that species for access to those resources.
Human mercenaries causing a societal collapse? That would mean a large number of individuals who are willing to take orders from a machine to actively harm their communities. Very unlikely.
I’m wondering how you can hold that position given all the recent social disorder we’ve seen all over the world where social media driven outrage cycles have been a significant accelerating factor. People are absolutely willing to “take orders from a machine” (i.e. participate in collective action based on memes from social media) in order to “harm their communities” (i.e. cause violence and property destruction).
What is an “intelligent” machine? What is a machine that is “designed” to kill people? Why should a machine with limited intelligence that is “designed” to kill, such as an AIM-9 be more of a threat than a machine with vast intelligence that is designed to accomplish a seemingly innocuous goal, that has the destruction of humanity as an unintended side-effect.
Currently, leading militaries around the world are developing and using:
Drone swarms
Suicide drones
Assassin drones
Intelligent AI pilots for fighter jets
Targeting based on facial recognition
Robot dogs with mounted guns
None of these things scare me as much as GPT-4. Militaries are overwhelmingly staid and conservative institutions. They are the ones that are most likely to require extensive safeguards and humans-in-the-loop. What does scare me is the notion of a private entity developing a superintelligence, or an uncontrolled iterative process that will lead to a superintelligence and letting it loose accidentally.
I fail to see how Jungian analysis can actually debug LLMs better than the approach that Robert_AIZI used in their analysis of the “SolidGoldMagikarp” glitch token.
It is a belief that doesn’t pay rent. Let’s assume that there is such a thing as a collective unconscious, which is the source of archetypes. What additional predictions does this enable? Why should I add the notion of a collective unconscious to my existing psychological theory? Why shouldn’t I trim away this epicycle with Occam’s Razor?
The idea that individuals are driven by subconscious or unconscious instincts is a well established fact of psychology. The idea of a collective unconscious, in the way the Jung described it, is the unfalsifiable woo.
My objection is that any intelligence that is capable of considering these arguments and updating its goals in response is an intelligence that is either already aligned or capable of being brought into into alignment (i.e. “corrigible”).
An unaligned intelligence will have just as much comprehension of this post as a shredder has of the paper it’s chewing to pieces.
Let’s assume there is a rational agent without a goal.
Why should we assume this? Where would such an agent come from? Who would create it?
I had several teachers in both high school and college ask for writing assignments that had maximum word counts. e.g. “Write an explanation of the significance of the Treaty of Westphalia, in no more than 1500 words.” Those writing assignments were, to me, more difficult than the ones that had minimum word counts, because they required you to get to all the major elements of the response in a minimum of space, leaving space for explanation and implication.
And, for what it’s worth, writing in the “real world” is far more like this. Journalists, for example, rarely have minimum word count limits, but very often have maximum word counts.
I’ve never understood the obsession with going to bed and getting up at fixed times, independent of the seasons and everything else. (Is it a general American thing? I don’t hear about it in the UK.)
It’s a you-have-to-commute-to-work thing. If you’re expected in the office by a particular time (i.e. for morning stand-up), then you need to leave at a particular time. This implies you need to wake up at a particular time, so you can brush your teeth, shower, get dressed, etc.
At every time step, the AI will be trading off these drives against the value of producing more or doing more of whatever it was programmed to do. What happens when the AI decides that it’s learned enough from the biosphere and that the costs of preserving a biosphere for humans no longer outweigh the potential benefit that it earns from learning about biology, evolution and thermodynamics?
We humans make these trade-offs all the time, often unconsciously, as we weigh whether to bulldoze a forest, or build a dam, or dig a mine. A superintelligent AI will perhaps be more intentional in its calculations, but that’s still no guarantee that the result of the calculation will swing in humanity’s favor. We could, in theory, program the AI to preserve earth as a sanctuary. But, in my view, that’s functionally equivalent to solving alignment.
Your argument appears to be that an unaligned AI will, spontaneously, choose to, at the very least, preserve Earth as a sanctuary for humans into perpetuity. I still don’t see why it should do that.
Why should the AI prioritize preserving information over whatever other goal that it’s been programmed to accomplish?
The Sequence post Doublethink (Choosing To Be Biased) addresses the general form of this question, which is, “Is it ever optimal to adopt irrational beliefs in order to advance instrumental goals, such as happiness, wealth, etc?”
I’ll quote at length what I think is the relevant part of the post:
For second-order rationality to be genuinely rational, you would first need a good model of reality, to extrapolate the consequences of rationality and irrationality. If you then chose to be first-order irrational, you would need to forget this accurate view. And then forget the act of forgetting. I don’t mean to commit the logical fallacy of generalizing from fictional evidence, but I think Orwell did a good job of extrapolating where this path leads.
You can’t know the consequences of being biased, until you have already debiased yourself. And then it is too late for self-deception.
The other alternative is to choose blindly to remain biased, without any clear idea of the consequences. This is not second-order rationality. It is willful stupidity.
Be irrationally optimistic about your driving skills, and you will be happily unconcerned where others sweat and fear. You won’t have to put up with the inconvenience of a seat belt. You will be happily unconcerned for a day, a week, a year. Then crash, and spend the rest of your life wishing you could scratch the itch in your phantom limb. Or paralyzed from the neck down. Or dead. It’s not inevitable, but it’s possible; how probable is it? You can’t make that tradeoff rationally unless you know your real driving skills, so you can figure out how much danger you’re placing yourself in. You can’t make that tradeoff rationally unless you know about biases like neglect of probability.
No matter how many days go by in blissful ignorance, it only takes a single mistake to undo a human life, to outweigh every penny you picked up from the railroad tracks of stupidity.
In other words, the trouble with wilfully blinding yourself to reality is that you don’t get to choose what you’re blinding yourself to. It’s very difficult to say, “I’m going to ignore rationality for these specific domains, and only these specific domains.” The human brain really isn’t set up like that. If you’re going to abandon rational thought in favor of religious thought, are you sure you’ll be able to stop before you’re, e.g. questioning the efficacy of vaccines?
Another way of looking at the situation is by thinking about The Litany of Gendlin:
What is true is already so.
Owning up to it doesn’t make it worse.
Not being open about it doesn’t make it go away.
And because it’s true, it is what is there to be interacted with.
Anything untrue isn’t there to be lived.
People can stand what is true,
for they are already enduring it.
If an AI is capable of taking 99% of the resources that humans rely on to live, it’s capable of taking 100%.
Tell me why the AI should stop at 99% (or 85%, or 70%, or whatever threshold you wish to draw) without having that threshold encoded as one of its goals.
To repeat what I said above: even a total launch of all the nuclear weapons in the world will not be sufficient to ensure human extinction. However, AI driven social, economic, and environmental changes could ensure just that.
If an AI got hold of a few nuclear weapons and launched them, that would, in fact, probably be counterproductive from the AI’s perspective, because in the face of such a clear warning sign, humanity would probably unite and shut down AI research and unplug its GPU clusters.