RogerDearnaley comments on Alignment Proposal: Adversarially Robust Augmentation and Distillation

RogerDearnaley 26 May 2025 9:25 UTC
5 points
3
Humans are generally fairly good at forming cooperative societies when we have fairly comparable amounts of power, wealth, and so forth. But we have a dreadful history when a few of us are a lot more powerful than others. To take an extreme example, very few dictatorships work out well for anyone but the dictator, his family, buddies, and to a lesser extent henchmen.
In the presence of superintelligent AI, if that AI is aligned only to its current user, access to AI assistance is the most important form of power, fungible to all other forms. People with access to power tend to find ways to monopolize it. So any superintelligent AI aligned only to its current user is basically a dictatorship or oligarchy waiting to happen.
Even the current frontier labs are aware of that, and have written in corporate acceptable use policies and attempt to train the AI to enforce these and refuse to assist with criminal or unethical requests from the end-users. As AI become more powerful, nation-states are going to step in, and make laws about what AIs can do: not assisting with breaking the law seems a very plausible first candidate, and is a trivial extension opf existing laws around conspiracy.
Any practical alignment scheme is going to need to be able to cope with this case, where the principal is not a single user but a hierarchy of groups each imposing certain vetoes and requirements, formal or ethical, on the actions of the group below it, down to the end user.
- Cole Wyeth 26 May 2025 9:49 UTC
  1 point
  0
  Parent
  I think our alignment scheme deals with this case pretty gracefully; these restriction can be built into the protocol.
  With that said, my goal is to prevent any small group from gaining too much power by augmenting many humans.