How good at chess would it be before it started killing people or making paper clips ? The argument is about what an artificial general intelligence would do.
TheAncientGeek
I don’t know what thinking about X will do to me. So either I never attempt to self improve, or I take a chance.
If your beef is about unintelligent, but super efficient machines, why communicate with the .AI community ? That’s generally not what they are trying to build.
The claim behind my “But we humans can’t even do that!” is a weaker one: there are some moral questions with no consensus answer, or where there is a consensus but some people flout it. In situations like these people sometimes even accuse other people outright of not knowing right from wrong, or incredulously ask, “don’t you know right from wrong?”
Absence of consensus does not imply absence of objective truth
I see no necessary reason why the same issues wouldn’t crop up for other, smarter intelligences.
i don’t know about “necessary” but “they’re smarter” is possible and reasonably likely.
I suppose you could build an AI that had both drives to self improve, and an extreme caution about accidentally changing its other values (although evolution doesn’t seem to have built us that way). That gives you the welcome conclusion that the AI in question is potentially unfriendly, rather than the disturbing one that it is potentially self-correcting. But we already knew you could build unfriendly AIs if you want to: the question is whether the friendly or neutral AI you think you are building will turn on you, whether you can achive unfriendliness without carefully designing it in.
Wei Dai’s comment is full of wisdom. In particular:
The Orthogonality Thesis (or it’s denial) must assume that certain types of AI, e.g., those based on generic optimization algorithms that can accept a wide range of objective functions, are feasible (or not) to build, but I don’t think we can safely make such assumptions yet.
But even if that is true, it is nowhere near enough to support an OT that can be plugged into an unfriendliness argument. The Unfriendliness argument requires that it is reasonably likely that researchers could create a paperclipper without meaning to. However, if paperclippers require an architecture—a possible architecture, but only one possible architecture—where goals and their implementation are decoupled, then both requirements are undermined. It is not clear that we can build such machines (“based on generic optimization algorithms that can accept a wide range of objective functions”) , hence a lack of likelihood; and it is also not clear that well intentioned people would.
Unfriendliness of the sort that MIRI worries about could be sidestepped by not adopting the architecture that supports orthogonality, and choosing one of a number of alternatives.
What does “most” AGI s mean? Most we are likely to build? When our only model of AGI is human intelligence ?
There is no engineering process corresponding to a random dip into mind space.
You are assuming that the AI needs something from us, which may not be true as it develops further. The decorator follows the implied wishes not because he is smart enough to know what they are, but because he wishes to act in his client’s interest to gain payment, reputation, etc. Or he may believe that fulfilling his client’s wishes are morally good according to his morality. The mere fact that the wishes of his client are known does not guarantee that he will carry them out unless he values the client in some way to begin with (for their money or maybe their happiness)
You are assuming that an .AI will last have only instrumental rationality. That the OT is true.
Using this definition, everything containing the same number of atoms would be equally complex; you have to specify where each atom is. This does not feel correct. The authors modified the word complexity to something meaningless; and it most likely did not happen accidentally.
Fixing this problem is harder than complaining about it. A formal definition that captures intuitive notions of complexity seems to be lacking.
If you can build an AI like that even in theory, then the “universal morality” isn’t universal, just a very powerful attractor.
Objective moral truth is only universal to a certain category of agents. it doesn’t apply to sticks and stones, and it isn’t discoverable by crazy people, or people below a certain level of intelligence. If it isn’t discoverable to a typical LW-style AI, with an orthogonal architecture, unupdateable goals, and purely instrumental rationality (I’m tempted to call them Artificial Obsessive Compulsives), then so much the worse for them. That would be a further reason for filiing a paperclipper under “crazy person” rather than “rational agent”.
Evolution does very much seem to have built us this way, just very incompetently.
You are portraying morality as something arbitrary and without rational basis that we are nonetheless compelled to believe in. That is more of a habit of thought than an argument.
If there is such an universal morality, or strong attractor, it’s almost certainly something mathematically simple and in no way related to the complex fragile values humans have evolved.
Who says it is in no way related? SImple unviversal principles can can pan out to complex and localised values when they are applied to complex and localised situations. The simple universal laws of physics don;t mean evey physical thing is simple.
To us, it’d not seem moral at all, but horrifying and either completely incomprehensible, or converting us to it through something like nihilism or existential horror or pascal’s wager style exploits of decision theory, not appealing to human specific things like compassion or fun. After all, it has to work through the kind of features present in all sufficiently intelligent agents.
Lots of conclusions, but not many arguments there. The only plasuible point is that an abstract universal morality would be dry and unappealing. But that isn’t much of a case against moral objectivism, which only requires moral claims to be true.
For an example of what a morality that is in some sense universal looks like, look to the horror called evolution.
I don’t think “evolution” and “morality” are synonyms. In fact, I don’t; see much connection at all.
Thus, any AI that is not constructed in this paranoid way, is catastrophically UN friendly, on a much deeper level than any solution yet discovered.
Interesting use of “thus”, there,.
But such a machine would not be truly intelligent....That’s actually my definition of “superintelligent”
If no-one is actually working on that kind of intelligence, one that’s highly efficient at arbitrary and rigid goals (an AOC)...then what’s the problem?
I believe that a fundamental requirement for any rational agent is the motivation to act maximally intelligently and correctly. That requirement seems even more obvious if we are talking about a conjectured artificial general intelligence (AGI) that is able to improve itself to the point where it is substantially better at most activities than humans. Since if it wouldn’t want to be maximally correct then it wouldn’t become superhuman intelligent in the first place.
The standard counterargument is along the lines of: it won’t care about getting things right per se, it will only employ rationality as a means to other goals.(Or: instrumental rationality is the only kind, because that’s how we define rationality).
What justifies the “will”, the claim of necessity, or at least high probability, brings us back to the title of the original posting: evidence for the Orthogonality Thesis. Is non-instrumental rationality, rationality-as-a-goal impossible? Is no-one trying to build it? Why try to build single minded Artificial Obsessive Compulsives if it is dangerous? Isn’t rationality-as-a-goal a safer architecture?
I wasn’t trying to argue, just explain what appears to be the general consensus stance around here.
I’m not very concerned about consensus views unless they are supported by good arguments.
You seem to be using a lot of definitions differently than everyone else
I believe that I am using definitions that are standard for the world at large, if not to LW.
“Nothing other than an FAI has any morality. All intelligences, in all the multiverse, that are not deliberately made by humans to be otherwise, are crazy, in such a way it’ll remain so no matter how intelligent and powerful it gets.”
Does “nothing other than an AI” include humans?
“Smart” implicitly entails “knows the true beliefs”, whereas it doesn’t entail “has the right goals”.
It doesn’t exclude having the right goals, either. You could engineer something whose self-improvement was restricted from affecting its goals. But if that is dangerous, why would you?
The argument I tend to default to is, “if there were definitively no fundamental moral values, how would we expect the universe we observe to be different?”
If there were no mathematical truths, would the observable universe be different?
If we can’t point to any way that moral objectivity constrains our expectations, then it becomes another invisible dragon.
If every intelligent entity just passively recorded facts, that would be valid. But agents act, and morality is about acting rightly.
If I’m understanding the question correctly, then probably. Assuming for the sake of an argument that there could be an observable universe at all without mathematical truths, then I’d say we should at least expect things like the same numbers to add up to different sums in different contexts, circles having variable ratios of circumference to radius, etc
You think matheMatical truth is causal, SOMEHOW?
This doesn’t imply that any sort of moral objectivity need exist.
It wasn’t meant to: it was an argument against an argument against a claim, not an argument for a counter-claim. You were arguing that moral truths do not have the epistemology that would be expected of empirical truths: but they are not empirical truths.
For our universe to run on mathematical laws, there have to be some.
Laws may be causal. I was asking about truths.
Not every mathematical truth need apply directly to the real world, but if none of them did, then we’d have rather less reason to suspect that they were actually truths.
The vast majority of them do not apply to the real world. For every inverse square law that applies, there is an inverse cube law (etc) that does not. However, that is physics. (Pure) mathematicians aren’t concerned about that.
Can you give any examples of things we would mutually recognize as truths for which we cannot observe evidence? Math, as we have already covered, I do not acknowledge as an example,
OTOH, I have never seen a mathematical proof that used observation of experiment.
There’s a certain probability that it would do the right thing anyway, a certain probability that it wouldn’t and so on. The probability of an AGI turning unfriendly depends on those other probabilities, although very little attention has been given to moral realism/objectivism/convergence by MIRI.
There are multiple variations on the OT, and the kind that just say it is possible can’t support the UFAI argument. The UFAI argument is conjunctive, and each stage in the conjunction needs to have a non-neglible probability, else it is a Pascal’s Mugging
Then open the prisons.