Control by Committee

Link post

I think I have an interesting new research direction: Aligning Committees.
Make ensembles of agents and have them act together as only one agent in the world with some protocol for how they combine preferences.

Main Motivating Construction:
Given a target consequence T, we construct a committee out of 3 expected utility maximizing agents: Planner, Wanter, and Unwanter (which have different utility functions). Planner proposes a plan and then Wanter and Unwanter either signoff or veto said plan. A plan P which gets both to sign off is executed, otherwise the agent does the safe but useless null action

The Utility Functions


Then as the committee gets smarter and more powerful I believe this is a mild optimizer.

Questions:
1. Is this already part of someone else’s research agenda? if so, who?
2. Is there anything I should definitely read before heading down this path?
3. Is the main motivating construction a mild optimizer? why or why not?

If this is interesting to you, please read the full post at the link.