Max Harms comments on 0. CAST: Corrigibility as Singular Target

Max Harms 6 May 2025 18:08 UTC
1 point
0
Alas, I’m not very familiar with Recursive Alignment. I see some similarities, such as the notion of trying to set up a stable equilibrium in value-space. But a quick peek does not make me think Recursive Alignment is on the right track. In particular, I strongly disagree with this opening bit:
What I propose here is to reconceptualize what we mean by AI alignment. Not as alignment with a specific goal, but as alignment with the process of aligning goals with each other. An AI will be better at this process the less it identifies with any side...
What appeals to you about it?
- Ram Potham 7 May 2025 0:24 UTC
  3 points
  0
  Parent
  I believe a recursively aligned AI model would be more aligned and safe than a corrigible model, although both would be susceptible to misuse.
  Why do you disagree with the above statement?
  - Max Harms 7 May 2025 15:28 UTC
    1 point
    0
    Parent
    My reading of the text might be wrong, but it seems like bacteria count as living beings with goals? More speculatively, possible organisms that might exist somewhere in the universe also count for the consensus? Is this right?
    If so, a basic disagreement is that I don’t think we should hand over the world to a “consensus” that is a rounding error away from 100% inhuman. That seems like a good way of turning the universe into ugly squiggles.
    If the consensus mechanism has a notion of power, such that creatures that are disempowered have no bargaining power in the mind of the AI, then I have a different set of concerns. But I wasn’t able to quickly determine how the proposed consensus mechanism actually works, which is a bad sign from my perspective.