# Partial preferences and models

Note: work­ing on a re­search agenda, hence the large amount of small in­di­vi­d­ual posts, to have things to link to in the main doc­u­ments.

EDIT: This model is cur­rently ob­so­lete, see here for the most cur­rent ver­sion.

I’ve talked about par­tial prefer­ences and par­tial mod­els be­fore. I haven’t been par­tic­u­larly con­sis­tent in ter­minol­ogy so far (“proto-prefer­ences”, “model frag­ments”), but from now on I’ll stick with “par­tial”.

## Definitions

So what are par­tial mod­els, and par­tial prefer­ences?

As­sume that ev­ery world is de­scribed by the val­ues of differ­ent vari­ables, .

A par­tial model is given by two sets, and , along with an ad­di­tion map . Thus for and , is an el­e­ment of .

We’ll want to have ‘rea­son­able’ prop­er­ties; for the mo­ment I’m imag­in­ing and as man­i­folds and as lo­cal home­o­mor­phism. If you don’t un­der­stand that ter­minol­ogy, it just means that is well be­haved and that as you move and around, you move in ev­ery di­rec­tion in .

A par­tial prefer­ence given the par­tial model above are two val­ues , along with the value judge­ment that:

• for all , de­scribes a bet­ter world than .

We can gen­er­al­ise to non-lin­ear sub­spaces, but this ver­sion works well for many cir­cum­stances.

## Interpretations

The are the fore­ground vari­ables that we care about in our par­tial model. The are the ‘back­ground vari­ables’ that are not rele­vant to the par­tial model at the mo­ment.

So, for ex­am­ple, when I con­tem­plate whether to walk or run back home, then the GDP of Swe­den, the dis­tance Voy­ager 2 is from Earth, the ac­tual value of the cos­molog­i­cal con­stant, the num­ber of deaths from malaria, and so on, are not ac­tu­ally rele­vant to that model. They are grouped un­der the (ir­rele­vant) back­ground vari­ables cat­e­gory.

No­tice that these vari­ables are only ir­rele­vant if they are in a ‘rea­son­able range’. If the GDP of Swe­den had sud­denly hit zero, if Voy­ager 2 was about to crash into my head, if the cos­molog­i­cal con­stant sud­denly jumped, or if malaria deaths reached of the pop­u­la­tion, then this would af­fect my walk­ing/​run­ning speed.

So the set also en­codes back­ground ex­pec­ta­tions about the world. Be­ing able to say that cer­tain val­ues are in an ‘ir­rele­vant’ range is a key part of sym­bol ground­ing and the frame prob­lem: it al­lows us to sep­a­rate and as be­ing, in a sense, com­ple­men­tary or or­thog­o­nal to each other. Note that hu­man defi­ni­tions of are im­plicit, in­com­plete, and of­ten wrong. But that doesn’t mat­ter; whether I be­lieve that wor­ld­wide deaths from malaria are in the thou­sands or in the mil­lions, that’s equally ir­rele­vant for my cur­rent de­ci­sion.

In com­par­i­son, the and the val­ues are much sim­pler, and are about the fac­tors I’m cur­rently con­tem­plat­ing: one of them in­volves run­ning, the other walk­ing. The vari­ables of could be fu­ture health, cur­rent tired­ness, how peo­ple might look at me as I run, how run­ning would make me feel, and how I cur­rently feel about run­ning. Or it could just be a sin­gle vari­able, like the mon­ster be­hind me with the teeth, or the whether I will be home on time to meet a friend.

So the par­tial prefer­ence is say­ing that, hold­ing the rest of the val­ues of the world con­stant, when look­ing at these is­sues, I cur­rently pre­fer to run or to walk.

## Re-in­vent­ing the wheel

This whole con­struc­tion feels like re-in­vent­ing the wheel: surely some­one has de­signed some­thing like par­tial mod­els be­fore? What are the search terms I’m miss­ing?

• Hi Stu­art,

I’m work­ing my way through your Re­search Agenda v0.9’ post, and am there­fore go­ing through var­i­ous older posts to un­der­stand things. I won­der if I could ask some ques­tions about the defi­ni­tion you pro­pose here?

First, that be con­tained in for some seems not so rele­vant; can I just as­sume X, Y and Z are some man­i­folds ( for some )? And we are given some par­tial or­der on X, so that we can re­fer to be­ing a bet­ter world’?

Then, as I un­der­stand it, your defi­ni­tion says the fol­low­ing:

Fix X, and Z. Let Y be a man­i­fold and , . Given a lo­cal ho­mo­mor­phism , we say that is par­tially preferred to if for all , we have .

I’m not sure which in­equal­ities should be strict, but this seems non-es­sen­tial for now. On the other hand, the de­pen­dence of this defi­ni­tion on the choice of Y seems some­what sub­tle and in­ter­est­ing. I will try to illus­trate this in what fol­lows.

First, let us make a new defi­ni­tion. Fix X, , and Z as be­fore. Let , a two-el­e­ment set equipped with the dis­crete topol­ogy, and let be an im­mer­sion of -man­i­folds. We say that is weakly par­tially preferred to if for all , we have .

First, it is clear that par­tial prefer­ence im­plies weak par­tial prefer­ence. More for­mally:

Claim 1: Fix X, and Z. Sup­pose we have a man­i­fold Y, points , , and a lo­cal ho­mo­mor­phism such that is par­tially preferred to . Set­ting with the sub­space topol­ogy from (i.e. dis­crete), and tak­ing to be the re­stric­tion of from to , we have that is weakly par­tially preferred to .

Proof: ob­vi­ous. $\qed$

How­ever, the con­verse can fail if Z is not con­tractible. First, let’s prove that the con­cepts are equiv­a­lent for Z con­tractible:

Claim 2: Fix X, and Z, and as­sume that Z is con­tractible. Sup­pose we have a two-el­e­ment set and a map mak­ing weakly par­tially preferred to . Then there ex­ist a man­i­fold Y, an in­jec­tion , and a lo­cal home­o­mor­phism whose re­stric­tion to is , mak­ing par­tially preferred to .

Proof: Let’s as­sume for sim­plic­ity of no­ta­tion that X is equidi­men­sional, say of di­men­sion , and write for the di­men­sion of Z. Let Y be the dis­joint union of two open balls of di­men­sion , with the in­clu­sion of the cen­tres of the balls. Then take an -neigh­bour­hood of Z in X; it is diffeo­mor­phic to since the nor­mal bun­dle to Z in X is triv­ial­is­able (c.f. https://​​math.stack­ex­change.com/​​ques­tions/​​857784/​​product-neigh­bor­hood-the­o­rem-with-bound­ary). $\qed$

If we want ex­am­ples where weak par­tial prefer­ence and par­tial prefer­ence don’t co­in­cide, we should look for an ex­am­ple where Z is not con­tractible, and its nor­mal bun­dle in X is not con­tractible.

Ex­am­ple 3: Let X be the dis­joint union of two moe­bius bands, and let Z be a cir­cle. Note that in­clud­ing Z along the cen­tre of ei­ther band gives a sub­man­i­fold whose tubu­lar neigh­bour­hood is not a product. As­sume that is such that one com­po­nent of X is preferred to the other (and is in­differ­ent within each con­nected com­po­nent). Then take , and to be the in­clu­sion of the two cir­cles along the cen­tres of the two moe­bius bands, such that ends up in the preferred band. This yields a situ­a­tion where is weakly par­tially preferred to , but the con­clu­sion of Claim 2 fails, i.e. this can­not be ex­tended to a par­tial prefer­ence for over .

What con­clu­sion should we draw from this? To me, it sug­gests that the no­tion of par­tial prefer­ence is not yet quite as one would want. In the set­ting of Ex­am­ple 3, where X con­sists of two moe­bius strips, one of which is preferred to the other, then land­ing in the preferred strip should be preferred to land­ing in the un-preferred strip?! And yet the `lo­cal home­o­mor­phism from a product’ con­di­tion gets in the way. This ex­am­ple is ob­vi­ously quite ar­tifi­cial, and maybe analo­gous things can­not oc­cur in re­al­ity. But I’m not so happy with this as an an­swer, since our ap­proaches to AI safety should be (so far as pos­si­ble) ro­bust against the flaws in our un­der­stand­ing of physics.

Apolo­gies for the overly-long com­ment, and for the im­perfect LaTeX (I’ve not used this type of form much be­fore).

• Hey there! Thanks for your long com­ment—but, alas, this model of par­tial prefer­ences is ob­so­lete :-(

Be­cause of other prob­lems with this, I’ve re­placed it with the much more gen­eral con­cept of a pre­order. This can ex­press all the things we want to ex­press, but is a lot less in­tu­itive for how hu­mans model things. I may come up with some al­ter­na­tive defi­ni­tion at some point (less gen­eral than a pre­order, but more gen­eral than this post.

Thanks for the com­ment in any case.

• Re: Rein­vent­ing the wheel

I don’t know of any slam dunk search term, but I sus­pect that the dis­cus­sion you want to have sur­round­ing par­tial prefer­ences will con­tain mainly similar­i­ties to the work done on ce­teris paribus laws. Par­tic­u­larly, if we ag­gre­gate the par­tial prefer­ences of all moral agents, we will pro­duce some­thing like a moral ce­teris paribus law, where we are hold­ing the set Z of back­ground vari­ables “un­changed” (i.e. within a “rea­son­able” range of val­ues). You might find the dis­cus­sion around the jus­tifi­ca­tion of CP laws use­ful.

Ad­di­tion­ally, I be­lieve there must be some rele­vant work on the ap­pli­ca­tion to moral­ity of modal logic and pos­si­ble world se­man­tics. I don’t have some­thing to point to here, but it might be a worth­while di­rec­tion.

• Side note: what do you think about prefer­ences about prefer­ences of other peo­ple?

For ex­am­ple: “I want M. to love me” or “I pre­fer that ev­ery­body will be util­i­tar­ian”.

Was it cov­ered some­where?

• Those are very nor­mal prefer­ences; they re­fer to states of the out­side world, and we can es­ti­mate whether that state is met or not. Just be­cause it’s po­ten­tially ma­nipu­la­tive, doesn’t mean it isn’t well-defined.

• But they are some­how re­cur­sive: I need to know the real na­ture of hu­man prefer­ences in or­der to be sure that other peo­ple ac­tu­ally want what I want.

In other words, such prefer­ences about prefer­ence have em­bed­ded idea about what I think is “prefer­ence”: if M. will be­have as if she loves me—is it enough? Or it should be her claims of love? Or her emo­tions? Or co­herency of all three?

• How does this no­tion of par­tial prefer­ences differ from say­ing “prefer­ences are de­ter­mined by a causal net”? I.e., the y’s would be the di­rect causal par­ents of a de­ci­sion, and the z’s ev­ery­thing else.

• This differs, be­cause the z are as­sumed to be in a “stan­dard” range. There are situ­a­tions where ex­treme val­ues of z, if known and re­flected upon, would change the sign of the de­ci­sion (for ex­am­ple, what if your de­ci­sion is be­ing filmed, and there are billions be­ing bet upon your ul­ti­mate choice, by var­i­ous moral and im­moral groups?).

But yeah, if you as­sume that the z are in that stan­dard range, then this looks a lot like con­sid­er­ing just a few nodes of a causal net.