Dmytry comments on [draft] Concepts are Difficult, and Unfriendliness is the Default: A Scary Idea Summary

Dmytry 31 Mar 2012 12:56 UTC
5 points
0

The way to fix the quoted argument is to have the utility function be random, grafted on to some otherwise-functioning AI.

Not demonstrably doable, arises from wrong intuitions arising from thinking too much about the AIs with oracular powers of prediction which straightforwardly maximize the utility, rather than of realistic cases—on limited hardware—which have limited foresight and employ instrumental strategies and goals which have to be derived from the utility function (and which can alter the utility function unless it is protected. The fact that utility modification is against the utility itself is insufficient when employing strategies and limited foresight).

Furthermore, an utility function can be self destructive.

A random utility function is maximized by a random state of the universe.

False. A random code for a function crashes (or never terminates). Of the codes that do not crash, simplest codes massively predominate. Demonstrably false if you try to generate random utility functions by generating random C code, which evaluate the utility of some test environment.

The problem I have with those arguments is that a: many things are plain false, and b: you try to ‘fix’ stuff by bolting in more and more conjunctions (‘you can graft random utility functions onto well functioning AIs’) into your giant scary conjunction, instead of updating, when contradicted. That’s the definite sign of rationalization. It can also always be done no matter how much counter argument there exist—you can always add something into scary conjunction to make it happen. Adding conditions into conjunction should decrease it’s probability.
- Manfred 31 Mar 2012 13:28 UTC
  0 points
  0
  Parent
  Function as in function).
  - Dmytry 31 Mar 2012 13:51 UTC
    2 points
    0
    Parent
    I’d rather be concerned with implementations of functions, like Turing machine tapes, or C code, or x86 instructions, or the like.
    
    In any case the point is rather moot because the function is human generated. Hopefully humans can do better than random, albeit i wouldn’t wager on this—the FAI attempts are potentially worrisome as humans are sloppy programmers, and bugged FAIs would follow different statistics entirely. Still, I would expect bugged FAIs to be predominantly self destructive. (I’m just not sure if the non-self-destructive bugged FAI attempts are predominantly mankind-destroying or not)