Control by Committee

Link post

I think I have an interesting new research direction: Aligning Committees.
Make ensembles of agents and have them act together as only one agent in the world with some protocol for how they combine preferences.

Main Motivating Construction:
Given a target consequence T, we construct a committee out of 3 expected utility maximizing agents: Planner, Wanter, and Unwanter (which have different utility functions). Planner proposes a plan and then Wanter and Unwanter either signoff or veto said plan. A plan P which gets both to sign off is executed, otherwise the agent does the safe but useless null action $\emptyset .$

The Utility Functions

$U_{p l a n n e r} = {\begin{matrix} 1 & i f & P & h a p p e n s 0 & i f & \emptyset & h a p p e n s \end{matrix}$
$U_{w a n t e r} = ⎧ ⎨ ⎩ \begin{matrix} w & i f & \emptyset & h a p p e n s 1 & i f & P \land T & h a p p e n s 0 & i f & P \land \neg T & h a p p e n s \end{matrix}$

$U_{u n w a n t e r} = ⎧ ⎨ ⎩ \begin{matrix} 1 - n & i f & \emptyset & h a p p e n s 1 & i f & P \land \neg T & h a p p e n s 0 & i f & P \land T & h a p p e n s \end{matrix}$

Then as the committee gets smarter and more powerful $Pr [T | P] \in (w, n)$ I believe this is a mild optimizer.

Questions:
1. Is this already part of someone else’s research agenda? if so, who?
2. Is there anything I should definitely read before heading down this path?
3. Is the main motivating construction a mild optimizer? why or why not?

If this is interesting to you, please read the full post at the link.