ChristianKl comments on Why do misalignment risks increase as AIs get more capable?

ChristianKl 11 Apr 2025 15:19 UTC
2 points
0
What’s the difference between being given affordances and getting power? If you are given more affordances you have more power. However seeking is about doing things that increase the affordances you have.
- ryan_greenblatt 11 Apr 2025 16:16 UTC
  2 points
  0
  Parent
  The difference I was intending is:
  - The AI is intentionally given affordances by humans.
  - The AI gains power in a way which isn’t desired by its creators/builders (likely subversively).
  - ChristianKl 11 Apr 2025 16:45 UTC
    2 points
    0
    Parent
    Plenty of things I desire happen without me intending for them to happen.
    In general, the reference class is misalignment of agents and AIs aren’t the only agents. We can look at how the terms works in a corporation.
    There are certain powers that a CEO of a big corporation intentionally delegates to a mid-level manager. I think there’s plenty that a CEO appreciates his mid-level manager to do that the CEO does not explicitly task the mid-level manager to do. The CEO likely appreciates if the mid-level manager autonomously takes charge of solving problems without bothering the CEO about it.
    On the other hand, there are also ways where the mid-level manager does company politics and engineers a situation so that the CEO giving the mid-level manager certain powers that the CEO doesn’t desire to give the mid-level manager. The CEO feels forced to do so because of how the company politics play out, so the CEO does intentionally give the powers out to the mid-level manager.
    - ryan_greenblatt 11 Apr 2025 17:28 UTC
      2 points
      0
      Parent
      Sure, there might be a spectrum, (though I do think some cases are quite clear cut), but I still think the distinction is useful.
      - ChristianKl 12 Apr 2025 9:54 UTC
        2 points
        −2
        Parent
        I don’t think it’s a spectrum. A spectrum is something one-dimensional. The problem with your distinction is that someone might think that there are safe from problems arising from power seeking (in a more broad definition) if they prevent the AI from doing things that they don’t desire.
        There are probably three variables that matter:
        How much agency has the human in the interaction.
        How much agency has the AI agent in the interaction.
        Does the AI cooperate or defect in the game theoretic sense.
        If you have a low agency CEO and a high agency very smart middle manager that always cooperates, that middle manager can still acquire more and more power over the organization.