Vladimir_Nesov comments on Using predictors in corrigible systems

Vladimir_Nesov 20 Jul 2023 21:34 UTC
2 points
0

Unfortunately, I think that tool AIs want to become agents

Tool AIs are probably key components of aligned or at least debuggable and instruction following AGIs. If you have aligned AGIs, it’s probably trivial to build misaligned agents using the same methods, whether tool AIs were their components or not. Perhaps even blind-to-the-world pivotal AIs could be trained on real world datasets instead to become general agents. So this is hardly an argument against a line of alignment investigation, as this danger seems omnipresent.

Unfortunately, it tends to come up in that context. For example, Drexler felt compelled to disclaim in a recent post:

My intention is not to disregard agent-focused concerns — their importance is assumed, not debated. Indeed, the AI services model anticipates a world in which dangerous superintelligent agents could emerge with relative ease, and perhaps unavoidably. My aim is to broaden the working ontology of the community to include systems in which superintelligent-level capabilities can take a more accessible, transparent, and manageable form, open agencies rather than unitary agents.
- Seth Herd 20 Jul 2023 23:51 UTC
  2 points
  0
  Parent
  I agree. I didn’t intend it as an argument against that line of research, because I think adapting oracles into agents is inevitable. Interesting that Drexler says the same thing, and his idea of having controlled strong AI systems as a counterbalance is interesting.