Using vector fields to visualise preferences and make them consistent

This post was writ­ten for Con­ver­gence Anal­y­sis by Michael Aird, based on ideas from Justin Shov­e­lain and with on­go­ing guidance from him. Through­out the post, “I” will re­fer to Michael, while “we” will re­fer to Michael and Justin or to Con­ver­gence as an or­gani­sa­tion.

Epistemic sta­tus: High con­fi­dence in the core ideas on an ab­stract level. Claims about the use­ful­ness of those ideas, their prac­ti­cal im­pli­ca­tions, and how best to con­cretely/​math­e­mat­i­cally im­ple­ment them are more spec­u­la­tive; one goal in writ­ing this post is to re­ceive feed­back on those things. I’m quite new to many of the con­cepts cov­ered in this post, but Justin is more fa­mil­iar with them.

Overview

This post out­lines:

  • What vec­tor fields are

  • How they can be used to vi­su­al­ise preferences

  • How util­ity func­tions can be gen­er­ated from “prefer­ence vec­tor fields” (PVFs)

  • How PVFs can be ex­trap­o­lated from limited data on preferences

  • How to vi­su­al­ise in­con­sis­tent prefer­ences (as “curl”)

  • A rough idea for how to “re­move curl” to gen­er­ate con­sis­tent util­ity functions

  • Pos­si­ble ar­eas for fu­ture research

We ex­pect this to provide use­ful tools and in­sights for var­i­ous pur­poses, most no­tably AI al­ign­ment, ex­is­ten­tial risk strat­egy, and ra­tio­nal­ity.

This post is struc­tured mod­u­larly; differ­ent sec­tions may be of in­ter­est to differ­ent read­ers, and should be use­ful in iso­la­tion from the rest of the post. The post also in­cludes links to ar­ti­cles and videos in­tro­duc­ing rele­vant con­cepts, to make the post ac­cessible to read­ers with­out rele­vant tech­ni­cal back­grounds.

Vec­tor fields and preferences

A vec­tor rep­re­sents both mag­ni­tude and di­rec­tion; for ex­am­ple, ve­loc­ity is a vec­tor that rep­re­sents not just the speed at which one is trav­el­ling but also the di­rec­tion of travel. A vec­tor field es­sen­tially as­so­ci­ates a vec­tor to each point in a re­gion of space. For ex­am­ple, the fol­low­ing image (source) shows the strength (rep­re­sented by ar­row lengths) and di­rec­tion of the mag­netic field at var­i­ous points around a bar mag­net:

Figure 1.

Another com­mon us­age of vec­tor fields is to rep­re­sent the di­rec­tion in which fluid would flow, for ex­am­ple the down­hill flow of wa­ter on un­even ter­rain (this short video shows and dis­cusses that vi­su­al­i­sa­tion).

We be­lieve that vec­tor fields over “state spaces” (pos­si­ble states of the world, rep­re­sented by po­si­tions along each di­men­sion) can be a use­ful tool for anal­y­sis and com­mu­ni­ca­tion of var­i­ous is­sues (e.g., ex­is­ten­tial risk strat­egy, AI al­ign­ment). In par­tic­u­lar, we’re in­ter­ested in the idea of rep­re­sent­ing prefer­ences as “prefer­ence vec­tor fields” (PVFs), in which, at each point in the state space, a vec­tor rep­re­sents which di­rec­tion in the state space an agent would pre­fer to move from there, and how in­tense that prefer­ence is.[1] (For the pur­poses of this post, “agent” could mean an AI, a hu­man, a com­mu­nity, hu­man­ity as a whole, etc.)

To illus­trate this, the fol­low­ing PVF shows a hy­po­thet­i­cal agent’s prefer­ences over a state space in which the only di­men­sions of in­ter­est are wealth and se­cu­rity.[2][3]

Figure 2.

The fact that (at least over the do­main shown here) the ar­rows always point at least slightly up­wards and to the right shows that the agent prefers more wealth and se­cu­rity to less, re­gard­less of the cur­rent level of those vari­ables. The fact that the ar­rows are longest near the x axis shows that prefer­ences are most in­tense when se­cu­rity is low. The fact that the ar­rows be­come grad­u­ally more hori­zon­tal as we move up the y axis shows that, as se­cu­rity in­creases, the agent comes to care more about wealth rel­a­tive to se­cu­rity.

Not only preferences

In a very similar way, vec­tor fields can be used to rep­re­sent things other than prefer­ences. For ex­am­ple, we might sus­pect that for many agents (e.g., most/​all hu­mans), prefer­ences do not perfectly match what would ac­tu­ally make the agent hap­pier (e.g., be­cause of the agent be­ing mis­taken about some­thing, or hav­ing sep­a­rate sys­tems for re­ward vs mo­ti­va­tion). In this case, we could cre­ate a vec­tor field to rep­re­sent the agent’s prefer­ences (rep­re­sented by the blue ar­rows be­low), and an­other to rep­re­sent what changes from any given point would in­crease the agent’s hap­piness (rep­re­sented by the green ar­rows).

Figure 3.

This method of lay­er­ing vec­tor fields rep­re­sent­ing differ­ent things can be used as one tool in analysing po­ten­tial clashes be­tween differ­ent things (e.g., be­tween an agent’s prefer­ences and what would ac­tu­ally make the agent happy, or be­tween an agent’s be­liefs about what changes would be likely at each state and what changes would ac­tu­ally be likely at each state).

For ex­am­ple, the above graph in­di­cates that, as wealth and/​or se­cu­rity in­creases (i.e., as we move across the x axis and/​or up the y axis), there is an in­creas­ing gap be­tween the agent’s prefer­ences and what would make the agent happy. In par­tic­u­lar, se­cu­rity be­comes in­creas­ingly more im­por­tant than wealth for the agent’s hap­piness, but this is not re­flected in the agent’s prefer­ences.

(Note that, while it does make sense to com­pare the di­rec­tion in which ar­rows from two differ­ent vec­tor fields point, I haven’t yet thought much about whether it makes sense to com­pare the lengths Gra­pher shows for their ar­rows. It seems like this is math­e­mat­i­cally the same as the com­mon prob­lem of try­ing to com­pare util­ity func­tions across differ­ent agents, or prefer­ences across differ­ent vot­ers. But here the func­tions rep­re­sent differ­ent things within the same agent, which may make a differ­ence.)

Gra­di­ents and util­ity functions

When a vec­tor field has no “curl” (see the sec­tion “Curl and in­con­sis­tent prefer­ences” be­low), the vec­tor field can be thought of as the gra­di­ent of a scalar field.[4] (A scalar field is similar to a vec­tor field, ex­cept that it as­so­ci­ates a scalar with each point in a re­gion of space, and scalars have only mag­ni­tude, rather than mag­ni­tude and di­rec­tion.) Essen­tially, this means that the ar­rows of the vec­tor field can be thought of as point­ing “up­hill”, away from low points and to­wards high points of the as­so­ci­ated scalar func­tion. If the vec­tor field rep­re­sents prefer­ences, higher points of the scalar func­tion would be where prefer­ences are more satis­fied, and lower points are where it is less satis­fied; thus, the scalar func­tion can be thought of as the agent’s util­ity func­tion.[5] (The same ba­sic method is of­ten used in physics, in which con­text the scalar func­tion typ­i­cally rep­re­sents scalar po­ten­tial.)

Below is one vi­su­al­i­sa­tion of the scalar field rep­re­sent­ing the util­ity func­tion of the agent from the pre­vi­ous ex­am­ple (based on its prefer­ences, not on what would make it “happy”), as well as the re­lated vec­tor field. Colours to­wards the red end of the spec­trum rep­re­sent higher val­ues of the scalar field. It can be seen that the ar­rows of the vec­tor field point away from blue ar­eas and to­wards red ar­eas, rep­re­sent­ing the agent’s prefer­ence for “climb­ing up­hill” on its util­ity func­tion.

Figure 4.

The scalar field can also be rep­re­sented in three di­men­sions, as val­ues on the z di­men­sion, which are in turn a func­tion of val­ues on the x and y di­men­sions. This is shown be­low (from two an­gles), for the same agent. (Th­ese graphs are a lit­tle hard to in­ter­pret from still images on a 2D screen, at least with this func­tion; such graphs can be eas­ier to in­ter­pret when one is able to ro­tate the an­gle of view.)

Figures 5a and 5b.

Method

This video pro­vides one clear ex­pla­na­tion of the ac­tual method for de­ter­min­ing the scalar func­tion that a curl-free vec­tor field can be thought of as the gra­di­ent of (though the video is fo­cused on cases of 3D vec­tor fields). That video de­scribes this as find­ing the “po­ten­tial”; as noted ear­lier, when the vec­tor field rep­re­sents prefer­ences, the util­ity func­tion can be thought of as analo­gous to the “po­ten­tial” in other cases.

Per­son­ally, as a quick method of find­ing the scalar func­tion as­so­ci­ated with a 2D vec­tor field, I used the fol­low­ing al­gorithm, from the first an­swer on this page:

DSolve[{D[f[x, y], x] == [X COMPONENT OF THE VECTOR FIELD], D[f[x, y], y] == [Y COMPONENT OF THE VECTOR FIELD]}, f[x, y], {x, y}]

I in­put the al­gorithm into a Wolfram Cloud note­book, which seems to be free to use as long as you cre­ate an ac­count. (As noted in the an­swer on the linked page, this al­gorithm will come back with no solu­tion if the vec­tor field has curl. This makes sense, be­cause this gen­eral ap­proach can­not be used in this way if a field has curl; this is ex­plained in the sec­tion “Curl and in­con­sis­tent prefer­ences” be­low.) Fi­nally, I dou­ble-checked that the func­tion was a valid solu­tion by us­ing this calcu­la­tor to find its gra­di­ent, which should then be the same as the origi­nal vec­tor field.

Ex­trap­o­lat­ing PVFs (and util­ity func­tions) from spe­cific prefer­ence data

In re­al­ity, one rarely knows an agent’s ac­tual util­ity func­tion or their full PVF. In­stead, one is likely to only have data on the agent’s (ap­par­ent) prefer­ences at par­tic­u­lar points in state space; for ex­am­ple, the ex­tent to which they wanted more wealth and more se­cu­rity when they had $10,000 of sav­ings and a “4/​5” level of se­cu­rity.

One can imag­ine ex­trap­o­lat­ing a full prefer­ence vec­tor field (PVF) from that data. We do not know of a pre­cise method for ac­tu­ally do­ing this (we plan to do more re­search and thought re­gard­ing that in fu­ture). How­ever, con­cep­tu­ally speak­ing, it seems the pro­cess would be analo­gous to fit­ting a re­gres­sion line to ob­served data points, and, like that pro­cess, would re­quire strik­ing a bal­ance be­tween max­imis­ing fit with the data and avoid­ing overfit­ting.

For an ex­am­ple (based very loosely on Figure 3 in this ar­ti­cle), sup­pose that I know that Alice prefers car A to Car B, Car B to Car C, Car C to Car D, and Car D to Car A (i.e., to Alice, A>B>C>D>A).[6] I also know the weight (in thou­sands of pounds) and per­ceived “sporti­ness” (as rated by con­sumers) of the four cars, and am will­ing to make the sim­plify­ing as­sump­tion that these are the only fac­tors that in­fluenced Alice’s prefer­ences. I could then cre­ate a plane with weight on the x axis and sporti­ness on the y axis, show the po­si­tion of the four cars in this space, and rep­re­sent Alice’s prefer­ences with ar­rows point­ing from each car to­wards the car Alice would pre­fer to that one, as shown be­low:[7]

Figure 6.

I could then in­fer a PVF that (1) ap­prox­i­mately cap­tures Alice’s known prefer­ences, and (2) sug­gests what prefer­ences Alice would have at any other point in the plane (rather than just at the four points I have data for). In this case, one seem­ingly plau­si­ble PVF is shown be­low, with the length of each blue ar­row rep­re­sent­ing the strength of Alice’s prefer­ences at the as­so­ci­ated point. (This PVF still shows Alice’s known prefer­ences, but this is just for ease of com­par­i­son; those known prefer­ences are not ac­tu­ally part of the PVF it­self.)

Figure 7.

This PVF al­lows us to make pre­dic­tions about what Alice’s prefer­ences would be even in situ­a­tions we do not have any em­piri­cal data about. For ex­am­ple, this PVF sug­gests that if Alice had the hy­po­thet­i­cal car E (with a weight of ~2000 pounds and sporti­ness of ~55), she would pre­fer a car that was heav­ier and was higher for sporti­ness. In con­trast, the PVF also sug­gests that, if she had the hy­po­thet­i­cal car F (with a weight of ~6000 pounds and sporti­ness of ~55), she would pre­fer a car that was heav­ier and was rated lower for sporti­ness.

Of course, these pre­dic­tions are not nec­es­sar­ily ac­cu­rate. One could likely cre­ate many other PVFs that also “ap­pear” to roughly fit Alice’s known prefer­ences, and these could lead to differ­ent pre­dic­tions. This high­lights why we wish to find a more pre­cise/​“rigor­ous” method to bet­ter ac­com­plish the goal I have con­cep­tu­ally ges­tured at here.

It’s also worth not­ing that one could ex­trap­o­late an agent’s util­ity func­tion from limited prefer­ence data by first us­ing the method ges­tured at here and then us­ing the method cov­ered in the pre­vi­ous sec­tion. That is, one could gather some data on an agent’s (ap­par­ent) prefer­ences, ex­trap­o­late a PVF that “fits” that data, and then calcu­late what (set of) scalar func­tion(s) that vec­tor field is the gra­di­ent of. That scalar func­tion would be the agent’s ex­trap­o­lated util­ity func­tion.

How­ever, as noted ear­lier, this method only works if the PVF has no “curl”, so it would not work in the case of Alice’s prefer­ences about cars. I will now dis­cuss what I mean by “curl”, what im­pli­ca­tions curl has, and a rough idea for “re­mov­ing” it.

Curl and in­con­sis­tent preferences

In the ex­am­ple above, to Alice, A>B>C>D>A. This is a case of in­tran­si­tivity, or, less for­mally, cir­cu­lar or in­con­sis­tent prefer­ences. This is typ­i­cally seen as ir­ra­tional, and as open­ing agents up to is­sues such as be­ing “money pumped”. It seems that Alice would be will­ing to just keep pay­ing us to let her trade in one car for the one she preferred to that one, and do this end­lessly—go­ing around and around in a cir­cle, yet feel­ing that her prefer­ences are be­ing con­tinu­ally satis­fied.

So an­other pair of rea­sons why rep­re­sent­ing prefer­ences as vec­tor fields is helpful is that do­ing so al­lows in­con­sis­ten­cies in prefer­ences:

  1. to be di­rectly seen (if they are suffi­ciently ex­treme)

  2. to be calcu­lated as the vec­tor field’s curl

This video in­tro­duces the con­cept of curl. Re­turn­ing to the vi­su­al­i­sa­tion of vec­tor fields as rep­re­sent­ing the di­rec­tion in which wa­ter would flow over a cer­tain do­main, curl rep­re­sents the speed and di­rec­tion an ob­ject would spin if placed in the wa­ter. For ex­am­ple, if there is a strong clock­wise curl at a cer­tain point, a stick placed there would ro­tate clock­wise; if there is no curl at a point, a stick placed there would not ro­tate (though it still may move in some di­rec­tion, as rep­re­sented by the vec­tor field it­self).

Note that the con­cepts of curl and in­con­sis­tency will also ap­ply in less ex­treme cases (i.e., where an agent’s prefer­ences do not only “chase each other around in cir­cles”).

As noted ear­lier, when a vec­tor field has curl, one can­not find its gra­di­ent. In our con­text, this seems log­i­cal; if an agent’s prefer­ences are in­con­sis­tent, it seems that the agent can­not have a true util­ity func­tion, and that we can’t as­sign any mean­ingful “height” to any point in the 2D state space. Con­sider again the ex­am­ple of Alice’s prefer­ences for cars; if we were to in­ter­pret meet­ing her prefer­ences as mov­ing “up­hill” on a util­ity func­tion, she could keep ar­riv­ing back at the same points in the state space and yet be at differ­ent “heights”, which doesn’t seem to make sense.

Re­mov­ing curl to cre­ate con­sis­tent util­ity functions

It seems that agents fre­quently have in­tran­si­tive prefer­ences, and thus that their PVFs will of­ten have some curl. It would there­fore be very use­ful to have a method for “re­mov­ing curl” from a PVF, to trans­late an in­tran­si­tive set of prefer­ences into a tran­si­tive set of prefer­ences, while mak­ing a min­i­mum of changes. This new, con­sis­tent PVF would also then al­low for the gen­er­a­tion of a cor­re­spond­ing util­ity func­tion for the agent.[8]

We be­lieve that this pro­cess should be pos­si­ble. We also be­lieve that, if de­vel­oped and con­firmed to make sense, it could be use­ful for var­i­ous as­pects of AI al­ign­ment (among other things). In par­tic­u­lar, it could help in:

  • ex­trap­o­la­tion of a con­sis­tent “core” (and cor­re­spond­ing util­ity func­tion) from in­con­sis­tent hu­man prefer­ences (which could then in­form an AI’s de­ci­sions)

  • ad­just­ment of an AI’s in­con­sis­tent prefer­ences (ei­ther by en­g­ineers or by the AI it­self), with a min­i­mum of changes be­ing made

We have not yet im­ple­mented this pro­cess for re­mov­ing curl. But we be­lieve that the Helmholtz the­o­rem should work, at least for PVFs in 3 or fewer di­men­sions (and we be­lieve that a higher di­men­sional gen­er­al­iza­tion prob­a­bly ex­ists). The Helmholtz the­o­rem:

states that any suffi­ciently smooth, rapidly de­cay­ing vec­tor field in three di­men­sions can be re­solved into the sum of an ir­ro­ta­tional (curl-free) vec­tor field and a solenoidal (di­ver­gence-free) vec­tor field; this is known as the Helmholtz de­com­po­si­tion or Helmholtz rep­re­sen­ta­tion. (Wikipe­dia)

This ir­ro­ta­tional (curl-free) vec­tor field would then be the con­sis­tent pro­jec­tion (in a CEV-like way) of the agent’s prefer­ences (from which the agent’s util­ity func­tion could also be gen­er­ated, in the man­ner dis­cussed ear­lier).

Uncer­tain­ties and ar­eas for fur­ther research

The fol­low­ing are some ar­eas we are par­tic­u­larly in­ter­ested in get­ting com­ments/​feed­back on, see­ing oth­ers ex­plore, or ex­plor­ing our­selves in fu­ture work:

  • Are there any flaws or mis­lead­ing el­e­ments in the above anal­y­sis? (As noted ear­lier, this is es­sen­tially just an ini­tial ex­plo­ra­tion of some tools/​con­cepts.)

  • To what ex­tent do the meth­ods used and claims made in this post gen­er­al­ise to higher-di­men­sional spaces (e.g., when we wish to rep­re­sent prefer­ences over more than two fac­tors at the same time)? To what ex­tent do they gen­er­al­ise to graphs of states that don’t cor­re­spond to any nor­mal ge­om­e­try?

  • Is there an ex­ist­ing, rigor­ous/​pre­cise method for ex­trap­o­lat­ing a PVF from a limited num­ber of known prefer­ences (or more gen­er­ally, ex­trap­o­lat­ing a vec­tor field from a limited num­ber of known vec­tors)? If not, can a satis­fac­to­rily rigor­ous/​pre­cise method be de­vel­oped?

  • Are there mean­ingful and rele­vant differ­ences be­tween the con­cepts of curl in vec­tor fields and of in­tran­si­tivity, in­con­sis­tency, ir­ra­tional­ity, and in­co­her­ence in prefer­ences? If so, how does that change the above anal­y­sis?

  • Is it pos­si­ble to “re­move curl” in the way we want, in the sort of situ­a­tions we’re in­ter­ested in (in par­tic­u­lar, not only in three di­men­sions)? If so, how, speci­fi­cally?

  • What other im­pli­ca­tions do the above ideas have? E.g., for ra­tio­nal­ity more gen­er­ally, or for how to in­ter­pret and im­ple­ment prefer­ence util­i­tar­i­anism. (Above, I mostly just in­tro­duced the ideas, and hinted at a hand­ful of im­pli­ca­tions.)

  • What other uses could these “tools” be put to?


  1. It ap­pears some prior work (e.g., this and this) has ex­plored the use of vec­tor fields to rep­re­sent prefer­ences. Un­for­tu­nately, I haven’t yet had time to in­ves­ti­gate this work, so there may be many use­ful in­sights in there that are lack­ing in this post. ↩︎

  2. Of course, there are of­ten far more than two key fac­tors in­fluenc­ing our prefer­ences. In such cases, a vec­tor field over more di­men­sions can be used in­stead (see here for an in­tro­duc­tion to 3D vec­tor fields). I fo­cus in this post on 2D vec­tor fields, sim­ply be­cause those are eas­ier to dis­cuss and vi­su­al­ise. We ex­pect many of the ideas and im­pli­ca­tions cov­ered in this post will be similar in higher di­men­sional vec­tor fields, but we aren’t yet cer­tain about that, and in­tend to more care­fully con­sider it later. ↩︎

  3. For both this ex­am­ple and most oth­ers shown, the pre­cise equa­tions used were cho­sen quite ar­bi­trar­ily, ba­si­cally by try­ing equa­tions semi-ran­domly un­til I found one that roughly matched the sort of shape I wanted. For those in­ter­ested, I have screen­shots of all equa­tions used, in their or­der of ap­pear­ance in this post, here. To cre­ate the vi­su­als in this post, I en­tered these equa­tions into Gra­pher (for those in­ter­ested in try­ing to do similar things them­selves, I found this guide use­ful). I dis­cuss be­low, in the sec­tion “Ex­trap­o­lat­ing PVFs (and util­ity func­tions) from spe­cific prefer­ence data”, the is­sue of how to ac­tu­ally gen­er­ate re­al­is­tic/​ac­cu­rate PVFs in the first place. ↩︎

  4. It’s pos­si­ble that here I’m con­flat­ing the con­cepts of con­ser­va­tive, ir­ro­ta­tional, and curl-free vec­tor fields in a way that doesn’t make sense. If any read­ers be­lieve this is the case, and es­pe­cially if they be­lieve this is­sue changes the core ideas and im­pli­ca­tions raised in this post, I would ap­pre­ci­ate them com­ment­ing or mes­sag­ing me. ↩︎

  5. Tech­ni­cally, the vec­tor field is the gra­di­ent of a class of func­tions, with the func­tions differ­ing only in their con­stant term. This is be­cause gra­di­ent only re­lates to differ­ences in height (or roughly analo­gous ideas, in higher-di­men­sional cases), not to ab­solute heights. One can imag­ine rais­ing or low­er­ing the en­tire scalar func­tion by the same con­stant with­out af­fect­ing the gra­di­ent be­tween points. (I show in this doc­u­ment ex­am­ples of what this would look like, while in this post it­self I keep all con­stants at 0.) Thus, in one sense, a PVF does not fully spec­ify the as­so­ci­ated util­ity func­tion rep­re­sen­ta­tion, but the con­stant can be ig­nored any­way (as util­ity func­tions are unique up to pos­i­tive af­fine trans­for­ma­tions). ↩︎

  6. I have pur­pose­fully cho­sen a set of cir­cu­lar (or “in­tran­si­tive”) prefer­ences, as the next ses­sion will use this ex­am­ple in dis­cussing the prob­lem of cir­cu­lar­ity and how to deal with it. ↩︎

  7. Note that, in this ex­am­ple, I am not as­sum­ing any knowl­edge about the strength of Alice’s prefer­ences, only about their di­rec­tion. As such, the length of the ar­rows rep­re­sent­ing Alice’s known prefer­ences has no par­tic­u­lar mean­ing. ↩︎

  8. In con­ver­sa­tion with Justin, Linda Linse­fors men­tioned hav­ing had a some­what similar idea in­de­pen­dently. ↩︎