Pattern comments on Why I expect successful (narrow) alignment

Pattern 30 Dec 2018 18:52 UTC
3 points
0
The big thing Eliezer seems to believe, which I don’t think any mainstream AI people believe, is that shoving a consequentialist with preferences about the real world into your optimization algorithm is gonna be the key to making it a lot more powerful.
From the article you linked:
Point two: The class of catastrophe I’m worried about mainly happens when a system design is supposed to contain two consequentialists that are optimizing for different consequences, powerfully enough that they will, yes, backchain from Y to X whenever X is a means of influencing or bringing about Y, doing lookahead on more than one round, and so on. When someone talks about building a system design out of having two of *those* with different goals, and relying on their inability to collude, that is the point at which I worry that we’re placing ourselves into the sucker’s game of trying to completely survey a rich strategic space well enough to outsmart something smarter than us.
[emphasis mine]
The piece seems to be about how trying to control AI by dividing power is a bad idea, because then we’re doomed if they ever figure out how to get along with each other.
- John_Maxwell 30 Dec 2018 21:11 UTC
  4 points
  0
  Parent
  Why would you put two consequentialists in your system that are optimizing for different sets of consequences? A consequentialist is a high-level component, not a low-level one. Anthropomorphic bias might lead you to believe that a “consequentialist agent” is ontologically fundamental, a conceptual atom which can’t be divided. But this doesn’t really seem to be true from a software perspective.