Error

LW server reports: not allowed.

This probably means the post has been deleted or moved back to the author's drafts.

Rohin Shah 1 Jun 2019 3:21 UTC
2 points
These seem reasonable as ways in which machine learning can fail, but how do any of them lead to a treacherous turn that kills all humans?
- Pattern 1 Jun 2019 21:15 UTC
  3 points
  Parent
  They’re giving examples of deception being learned which don’t meet their starting assumptions:
  i) We’re considering a seed AI able to recursively self-improve without human intervention.
  ii) There is some discontinuity at the conception of deception, i.e. when it first thinks of its treacherous turn plan.
  I think this is being presented because a treacherous turn requires deception. (This may be a necessary condition, but not a sufficient one.)
  - Rohin Shah 2 Jun 2019 17:09 UTC
    2 points
    Parent
    I think this is being presented because a treacherous turn requires deception.
    Right; my claim is that deception learned in this way will not lead to a treacherous turn, because the agent here is learning a deceptive policy, as opposed to learning the concept of deception, which is what you would typically need for a treacherous turn.
    - Michaël Trazzi 3 Jun 2019 14:01 UTC
      6 points
      Parent
      I agree that these stories won’t (naturally) lead to a treacherous turn. Continuously learning to deceive (a ML failure in this case, as you mentioned) is a different result. The story/learning should be substantially different to lead to “learning the concept of deception” (for reaching an AGI-level ability to reason about such abstract concepts), but maybe there’s a way to learn those concepts with only narrow AI.
  - countingtoten 2 Jun 2019 7:29 UTC
    1 point
    Parent
    
    I think this is being presented because a treacherous turn requires deception.
    
    As I’ve mentioned before, that is technically false (unless you want a gerrymandered definition).