johnswentworth comments on Value extrapolation partially resolves symbol grounding

johnswentworth 12 Jan 2022 16:53 UTC
LW: 15 AF: 11
0
AF
That might work in a tiny world model with only two possible hypotheses. In a high-dimensional world model with exponentially many hypotheses, the weight on happy humans would be exponentially small.
- Quintin Pope 13 Jan 2022 3:28 UTC
  3 points
  0
  Parent
  Wouldn’t there also be exponentially many variants of the “happy humans” hypothesis? We’re really interested in the probability assigned to all hypotheses whose fulfillment leads to human happiness. Once you’ve trained on happy humans videos, I think there’s plausibly enough probability mass assigned to happy humans hypotheses that the AI will actually cause a fair amount of happiness.
  - JBlack 15 Jan 2022 6:01 UTC
    2 points
    0
    Parent
    There would, so long as the extra dimensions are irrelevant. If there are more relevant dimensions then the total space becomes larger much faster than the happy space. Even having lots of irrelevant dimensions can be risky because it makes the training data sparser in the space being modelled, thus making superexponentially many more alternative hypotheses viable.