Dmytry comments on Trapping AIs via utility indifference

Dmytry 1 Mar 2012 15:22 UTC
0 points
Except the AI that’s worth it’s salt as far as danger goes does not, in fact, calculate EU(Z) or EU(A). It did not produce a function that calculates expected overall utility of a move, because it couldn’t, it takes too much computing power, it’s a bad approach. It did look at the final board state’s utility function (win/draw/loss one), and it did look at the rules of the game, and it did some thinking—how can I , without being able to calculate EU(Z) , make moves that would work—and came up with an approach based on that function. (Incidentally this approach is applicable only to fairly simple utility functions of some future state.)

An AI needs to be programmed in a specific, very under optimized way to allow you to make that sort of modification you’re proposing here.

Keep in mind that neat real valued utility functions are an entirely abstract, idealized model, used to reason about idealized decisionmaking by an agent that got infinite computing power and such. The real world AI has limited computing power, and the name of the game is to make the best use of computing power available, which means making decisions that help to increase the utility without ever calculating the utility directly or doing some comparisons between real numbers. Such AI, running under an utility function will have a lot of code that’s derived to help increase utility but doesn’t do it by calculating the utility. Then it would be impossible to just change it. Efficient code is unflexible.

Furthermore, sufficiently inefficient AI—such as idealized utility maximizing one where you just go ahead and replace one utility with another—which doesn’t self optimize beyond naive approach—is not much of a threat, even having a lot of computational power. The trees expand exponentially with depth. The depth is logarithmic in computing power.

edit: here is an example. The utility maximization and utility functions are to practical (and scary) AI as quantum chromodynamics is to practical computer graphics software I do for living . That is to say, you would probably have as much luck modifying AI’s behaviour by editing utility functions as you’d have of editing my cloud renderer to draw pink clouds by using modified standard model.
- Stuart_Armstrong 2 Mar 2012 10:37 UTC
  0 points
  Parent
  
  it did look at the rules of the game, and it did some thinking—how can I , without being able to calculate EU(Z) , make moves that would work—and came up with an approach based on that function. (Incidentally this approach is applicable only to fairly simple utility functions of some future state.)
  
  And that’s where it comes up with: play randomly for three moves, then apply the material advantage process. This maximises the new utility function, without needing to calculate EU(Z) (or EU(A)).
  
  An AI needs to be programmed in a specific, very under optimized way to allow you to make that sort of modification you’re proposing here.
  
  Specific, certainly; under-optimised is debatable.
  
  For a seed AI, we can build the the indifference in early, and under broad conditions, be certain that it will retain the indifference at a later step.
  - Dmytry 2 Mar 2012 11:39 UTC
    0 points
    Parent
    And why exactly is this ‘play randomly for 3 moves then applying material advantage’ gives better utility than just applying material advantage?
    
    Plus you got yourself some utility function that is entirely ill defined in a screwy self referential way (as the expected utility of a move ultimately depends to the AI itself and it’s ability to use resulting state after the move to it’s advantage). You can talk about it in words but you didn’t define it other than ‘okay now it will make ai indifferent’.
    
    To be contrasted with original well defined utility function of future states; the AI may be unable to predict the future states, and calculate some utility numbers to assign to moves, but it can calculate utility of particular end-state of the board, and it can reason from this to strategies. There’s simple thing for it to reason about, originally. I can write python code that looks at board, and tells the available legal moves or the win/loss/tie utility if it is end state. That is the definition of chess utility. AI can take it and reason about it. You instead have some utility that feeds back AI’s own conclusions about utility of potential moves into utility function.
    - Stuart_Armstrong 5 Mar 2012 19:41 UTC
      −1 points
      Parent
      
      And why exactly is this ‘play randomly for 3 moves then applying material advantage’ gives better utility than just applying material advantage?
      
      In this instance, they won’t differ at all. But if the AI had some preferences outside of the chess board, then the indifferent AI would be open to playing any particular move (for the first three turns) in exchange for some other separate utility gain.
      
      Plus you got yourself some utility function that is entirely ill defined in a screwy self referential way
      
      In fact no. It seems like that, because of the informal language I used, but the utility function is perfectly well defined without any reference to the AI. The only self-reference is the usual one—how do I predict my future actions now.
      
      If you mean that an indifferent utility can make these predictions harder/more necessary in some circumstances, then you are correct—but this seems trivial for a superintelligence.