Noosphere89 comments on Human takeover might be worse than AI takeover

Noosphere89 12 Jan 2025 5:30 UTC
4 points
0
I agree that things would be harder, mostly because of the potential for sudden capabilities breakthroughs if you have RL, combined with incentives to use automated rewards more, but I don’t think it’s so much harder that the post is incorrect, and my basic reason is I believe the central alignment insights like data mattering a lot more than inductive bias for alignment purposes still remain true in the RL regime, so we can control values by controlling data.

Also, depending on your values, AI extinction can be preferable to some humans taking over if they are willing to impose severe suffering on you, which can definitely happen if humans align AGI/ASI.
- Bronson Schoen 12 Jan 2025 10:59 UTC
  1 point
  0
  Parent
  so we can control values by controlling data.
  What do you mean? As in you would filter specific data from the posttraining step? What would you be trying to prevent the model from learning specifically?
  - Noosphere89 12 Jan 2025 15:35 UTC
    4 points
    0
    Parent
    I was thinking of adding synthetic data about our values in the pretraining step.
    - Bronson Schoen 13 Jan 2025 23:49 UTC
      1 point
      0
      Parent
      Is anyone doing this?
      - Noosphere89 13 Jan 2025 23:58 UTC
        2 points
        0
        Parent
        Maybe, but not at any large scale, though I get why someone might not want to do it, because it’d probably be costly to do this as an intentional effort to stably make an AI aligned (unless synthetic data automation more or less works.)