Ajeya Cotra comments on The case for aligning narrowly superhuman models

Ajeya Cotra 7 Mar 2021 6:15 UTC
LW: 7 AF: 5
0
AF
The intuition for it is something like this: suppose I’m trying to make a difficult decision, like where to buy a house. There are hundreds of cities I’d be open to, each one has dozens of neighborhoods, and each neighborhood has dozens of important features, like safety, fun things to do, walkability, price per square foot, etc. If I had a long time, I would check out each neighborhood in each city in turn and examine how it does on each dimension, and pick the best neighborhood.

If I instead had an army of clones of myself, I could send many of them to each possible neighborhood, with each clone examining one dimension in one neighborhood. The mes that were all checking out different aspects of neighborhood X can send up an aggregated judgment to a me that is in charge of “holistic judgment of neighborhood X”, and the mes that focus on holistic judgments of neighborhoods can do a big pairwise bracket to filter up a decision to the top me.
- johnswentworth 7 Mar 2021 6:50 UTC
  LW: 7 AF: 5
  0
  AF Parent
  I see, so it’s basically assuming that problems factor.
  - Ajeya Cotra 7 Mar 2021 7:07 UTC
    LW: 7 AF: 3
    0
    AF Parent
    Yeah, in the context of a larger alignment scheme, it’s assuming that in particular the problem of answering the question “How good is the AI’s proposed action?” will factor down into sub-questions of manageable size.