To me it looks like the main issues are in configuring the “delegates” so that they don’t “negotiate” quite like real agents—for example, there’s no delegate that will threaten to adopt an extremely negative policy in order to gain negotiating leverage over other delegates.
The part where we talk about these negotiations seems to me like the main pressure point on the moral theory qua moral theory—can we point to a form of negotiation that is isomorphic to the “right answer”, rather than just being an awkward tool to get closer to the right answer?
The threats problem seems like a specific case of problems that might arise by putting real intelligence in to the agents in the system. Especially if this moral theory was being run on a superintelligent AI, it seems like the agents might be able to come up with all sorts of creative unexpected stuff. And I’m doubtful that creative unexpected stuff would make the parliament’s decisions more isomorphic to the “right answer”.
One way to solve this problem might be to drop any notion of “intelligence” in the delegates and instead specific a deterministic algorithm that any individual delegate follows in deciding which “deals” they accept. Or take the same idea even further and specify a deterministic algorithm for resolving moral uncertainty that is merely inspired by the function of parliaments, in the same sense that the stable marriage problem and algorithms for solving it could have been inspired by the way people decide who to marry.
Eliezer’s notion of a “right answer” sounds appealing, but I’m a little skeptical. In computer science, it’s possible to prove that a particular algorithm, when run, will always achieve the maximal “score” on a criterion it’s attempting to optimize. But in this case, if we could formalize a score we wanted to optimize for, that would be equivalent to solving the problem! That’s not to say this is a bad angle of approach, however… it may be useful to take the idea of a parliament and use it to formalize a scoring system that captures our intuitions about how different moral theories trade off and then maximize this score using whatever method seems to work best. For example waves hands perhaps we could score the total regret of our parliamentarians and minimize that.
Another approach might be to formalize a set of criteria that a good solution to the problem of moral uncertainty should achieve and then set out to design an algorithm that achieves all of these criteria. In other words, making a formal problem description that’s more like that of the stable marriage problem and less like that of the assignment problem.
So one plan of attack on the moral uncertainty problem might be:
Generate a bunch of “problem descriptions” for moral uncertainty that specify a set of criteria to satisfy/optimize.
Figure out which “problem description” best fits our intuitions about how moral uncertainty should be solved.
Find an algorithm that provably solves the problem as specified in its description.
Eliezer Yudkowsky:
The threats problem seems like a specific case of problems that might arise by putting real intelligence in to the agents in the system. Especially if this moral theory was being run on a superintelligent AI, it seems like the agents might be able to come up with all sorts of creative unexpected stuff. And I’m doubtful that creative unexpected stuff would make the parliament’s decisions more isomorphic to the “right answer”.
One way to solve this problem might be to drop any notion of “intelligence” in the delegates and instead specific a deterministic algorithm that any individual delegate follows in deciding which “deals” they accept. Or take the same idea even further and specify a deterministic algorithm for resolving moral uncertainty that is merely inspired by the function of parliaments, in the same sense that the stable marriage problem and algorithms for solving it could have been inspired by the way people decide who to marry.
Eliezer’s notion of a “right answer” sounds appealing, but I’m a little skeptical. In computer science, it’s possible to prove that a particular algorithm, when run, will always achieve the maximal “score” on a criterion it’s attempting to optimize. But in this case, if we could formalize a score we wanted to optimize for, that would be equivalent to solving the problem! That’s not to say this is a bad angle of approach, however… it may be useful to take the idea of a parliament and use it to formalize a scoring system that captures our intuitions about how different moral theories trade off and then maximize this score using whatever method seems to work best. For example waves hands perhaps we could score the total regret of our parliamentarians and minimize that.
Another approach might be to formalize a set of criteria that a good solution to the problem of moral uncertainty should achieve and then set out to design an algorithm that achieves all of these criteria. In other words, making a formal problem description that’s more like that of the stable marriage problem and less like that of the assignment problem.
So one plan of attack on the moral uncertainty problem might be:
Generate a bunch of “problem descriptions” for moral uncertainty that specify a set of criteria to satisfy/optimize.
Figure out which “problem description” best fits our intuitions about how moral uncertainty should be solved.
Find an algorithm that provably solves the problem as specified in its description.