Short and long term goals have different implications regarding instrumental convergence . If I have the goal of immediately taking a bite of an apple that is in my hand right now, I don’t need to gather resources or consider strategies, I can just do it. On the other hand, imagine I have an apple in my hand and I want to take a bite of it in a trillion years. I need to (define ‘me’, ‘apple’, and ‘bite’; and) secure maximum resources, to allow the apple and I to survive that long in the face of nature, competitors and entropy. Thus, I instrumentally converge to throwing everything at universal takeover—except the basic necessities crucial to my goal.
Some of the cruxes that high P(Doom) rests on are that (sufficiently) General Intelligences will (1) have goals, (2) which will be long term, (3) and thus will instrumentally converge to wanting resources, (4) which are easiest to get with humans (and other AIs they might build) out of the way, (5) so when they can get away with it they’ll do away with humans.
So if we make General Intelligences with short term goals perhaps we don’t need to fear AI apocalypse.
Assuming the first crux, why the second? That is, assuming GIs will have goals, what are the best reasons to think that such intelligences will by default have long term goals (as opposed to short term goals like “quickly give a good answer to the question I was just asked”)?
We’re very likely to give them long term goals. And as I explain here: https://www.lesswrong.com/posts/x8bK7ohAHzMMchsaC/what-is-autonomy-and-how-does-it-lead-to-greater-risk-from lots of things people are likely to request seem near certain to lead to complex and autonomous systems.
In many cases, you also need incorrigibility, and stability under improvement.
I’m not sure I understand; are you saying that given these, we have high P(Doom), or that these are necessary to be safe even if GIs have only short term goals? Or something else entirely?
You need them for high P(doom) because otherwise the AI is corrigible, or stops at the thousandth paperclip.
The key is that if AGI’s are smarter than humans those organizations run AGIs that have long-term goals will outperform organizations that mix humans with long-term goals along with AGIs that are only capable to pursue short-term goals.
One long-term goal that many AGIs are going to have is to create training data to make the AGI more effective. Training data is good when it makes the AGI more performant over a long timeframe.
AGIs that run that way are going to outperform AGIs where humans oversee all the training data.
If the LT goal of the AI is perfectly aligned with the goals of the organisation, yes—smarter isn’t enough, it needs to be infallible. If it’s fallible, the organisation needs to be able to tweak the goals as it goes along. Remember, smarter means it’s better at executing it’s goal, not at understanding it.
The main goal of most companies is to make money. If an AGI that runs a company is better at that it outcompete other companies. It doesn’t need infallibility. Companies run by humans are also not perfectly aligned and the interests of managers and the interests of the company are different.
It’s to make money without breaking the law. An ASI that fuflfils both goals isn’t going to kill everybody, since murder is illegal. So even if you do have ASIs with stable long term goals, they don’t lead to doom. (It’s interesting to think of the chilling effect of a law that any human who creates an agentive AI is criminally responsible for what it does).
Most big companies don’t have the goal of making money without breaking the law but are often willing to break it as long as the punishment for breaking it isn’t too costly.
But even if the AGI doesn’t murder anyone in the first five year it operates it can still focus on acquiring resources and get untouchable from human actors and then engage in actions that lead to people dying. The holodomor wasn’t directly murder but people still died because they didn’t have food.
So long term goals aren’t a default; market pressure will put them there as humans slowly cede more and more control to AIs, simply because the latter are making decisions that work out better. Presumably this would start with lower level decisions (e.g. how exactly to write this line of code; which employee to reward based on performance) and then slowly be given higher level decisions to make. In particular, we don’t die the first time someone creates an AI with the ability to (escape, self improve and then) kill the competing humans, because that AI is likely focused on a much smaller more near term goal. That way, if we’re careful and clever we have a chance to study a smarter-than-human general intelligence without dying. Is that an accurate description of how you see things playing out?
If a powerful company is controlled by an AGI it doesn’t need to kill competing humans to avoid the humans from shutting the AGI down or modifying it.
We don’t know how to distinguish systems with long- and short-term goals. Even in principle, we don’t know how to say if AIXI-like program, running on hypercomputer, will optimize for long- or short-term goal. I.e., to your proposition “if we build AI with short-term goal, we are safe” correct response is “What exactly do you mean by short-term goal?”
Even what we intuitively understand to be “short-term goal” can be pretty scary. If something can bootstrap nanotech in week, planning horizon in 10 days doesn’t save us.
As to the definition of short term goal: any goal that is can be achieved (fully, e.g. without a “and keep it that way” clause) in a finite short time (for instance, in a few seconds), with the resources the system already has at hand. Equivalently, I think: any goal that doesn’t push instrumental power seeking. As to how we know a system has a short term goal: if we could argue that systems prefer short term goals by default, then we still wouldn’t know as to the goals of a particular system but we could hazard a guess that the goals are short term. Perhaps we could expect short term goals by default if they were, for instance, easier to specify, and thus to have. As pointed out by others, if we try to give systems long term goals on purpose, they will probably end up with long term goals.
The problem is the way we train AIs. We ALWAYS minimize error and optimize towards a limit. If I train an AI to take a bite out of an apple, what I am really doing is showing it thousands of example situations and rewarding it for acting in those situations where it improves the probability that it eats the apple.
Now let’s say it goes super intelligent. It doesn’t just eat one apple and say “cool, I am done—time to shut down.” No, we taught it to optimize the situation as to improve the probability that it eats an apple. For lack of better words, it feels “pleasure” in optimizing situations towards taking a bite out of an apple.
Once the probability of eating an apple reaches 100%, it will eventually drop as the apple is eaten, then the AI will once again start optimizing towards eating another apple.
It will try to set up situations where it eats apples for all eternity. (Assuming superintelligence does not result in some type of goal enlightenment.)
Ok, ok, you say. Well, we will just hard program it to turn off once it reaches a certain probability of meeting its goal. Good idea. Once it reaches 99.9% probability of taking a bite out of an apple. We automatically turn it off. That will probably work for an apple eating AI.
But what if our goal is more complicated? (Like fix climate change). Well, the AI may reach superintelligence before finishing the goal and decide it doesn’t want to be shut down. Good luck stopping it.