Align­ment prob­lems for economists

AI align­ment is a mul­tidiscip­lin­ary re­search pro­gram. This means that there is po­ten­tially rel­ev­ant know­ledge and skill scattered across dif­fer­ent dis­cip­lines. But it also means that people schooled only in nar­row dis­cip­lines will ex­per­i­ence a hurdle when they would work on a prob­lem in AI align­ment. One such dis­cip­line is eco­nom­ics, from which de­cision the­ory and game the­ory ori­gin­ated.

In this post I want to ex­plore the idea that we should try to cre­ate a col­lec­tion of “align­ment-prob­lems-for-eco­nom­ists”, pack­aged in a way that eco­nom­ists who have rel­ev­ant know­ledge and skill but don’t un­der­stand ML/​CS/​AF can work on them.

There seem to be sub-prob­lems in AI align­ment that eco­nom­ists might be able to work on. However, out of the eco­nom­ists that I’ve spoken to, some are en­thu­si­astic about this but see it as a per­sonal ca­reer-risk to work on it as they do not un­der­stand the com­puter sci­ence. So if we can take sub­prob­lems in align­ment, and pack­age them in a way that eco­nom­ists can im­me­di­ately start work­ing on them, then we might be able to util­ize in­tel­lec­tual re­sources (eco­nom­ists) that would oth­er­wise have worked on some­thing dif­fer­ent.

Two types of eco­nom­ists to target

1. Econom­ists who also to a de­gree un­der­stand ba­sic ML/​CS

2. Econom­ists who do not.

I don’t find it very plaus­ible that we could find sub-prob­lems for the second type to work on, but it doesn’t seem en­tirely im­possible: there could be cer­tain spe­cific prob­lems in mech­an­ism design or so­cial choice or so, that would be use­ful for align­ment but don’t re­quire any ML/​CS.

Prop­er­ties of align­ment-prob­lems-for-eco­nom­ists that are de­sir­able:

1. Pub­lish­able in eco­nom­ics journ­als. I have spoken to eco­nom­ists that are in­ter­ested in the align­ment prob­lem, but they are hes­it­ant to work on it: It is a risky ca­reer move to work on align­ment if they can­not pub­lish in journ­als that they are used to.

2. High work/​state­ment ra­tio. How long will it take to solve the prob­lem, versus provid­ing the state­ment of the prob­lem? If 90% of the prob­lem is to state it in a form so that an eco­nom­ist could work on it, then it would likely not be ef­fi­cient to do so. It should be a prob­lem that can re­l­at­ively eas­ily be com­mu­nic­ated clearly to an eco­nom­ist, while tak­ing more time to solve.

3. No strong re­li­ance on CS/​ML tools. Many eco­nom­ists are some­what fa­mil­iar with ba­sic ML tech­niques, but if a prob­lem re­lies too much on know­ledge of CS or ML, this in­creases the ca­reer-risk of the prob­lem.

4. Not ne­ces­sar­ily spe­cific­ally x-risk re­lated. If a prob­lem in align­ment is not spe­cific­ally x-risk re­lated, it is less/​not em­bar­rass­ing to work on it, and there­fore less of a ca­reer-risk. Never­the­less, most prob­lems in AI align­ment seem im­port­ant even if you don’t be­lieve that AI poses an x-risk. I don’t think this re­quire­ment is that im­port­ant.

* Does not have to be high-im­pact. If a prob­lem has only a small chance of be­ing some­what im­pact­ful, it might still be worth pack­aging it as an eco­nomic prob­lem, since the eco­nom­ists who could work on it would not oth­er­wise work on align­ment prob­lems at all.

I do not yet have a list of such prob­lems, but it seems that it might be pos­sible to make one:

For ex­ample, eco­nom­ists might work on prob­lems in mech­an­ism design and so­cial choice for AGI’s in a vir­tual con­tain­ment. For ex­ample, can we cre­ate mech­an­isms with de­sir­able prop­er­ties for the amp­li­fic­a­tion phase in Chris­ti­ano’s pro­gram, to align a col­lec­tion of dis­tilled agents? Can we prove that such mech­an­isms are ro­bust un­der cer­tain as­sump­tions? Can we cre­ate mech­an­isms that ro­bustly in­centiv­izes AGI’s with un­aligned util­ity func­tions to tell us the truth? Can we use so­cial choice to find out prop­er­ties of agents that con­sist of sub-agents?

Econom­ists work on stra­tegic com­mu­nic­a­tion between agents (cheap talk), which might be help­ful in the design of safe con­tain­ment sys­tems of not-su­per­in­tel­li­gent AGI. In­form­a­tion eco­nom­ics works on game the­or­etic prop­er­ties of dif­fer­ent al­loc­a­tions of in­form­a­tion, and might be use­ful in such mech­an­isms as well. Econom­ists also work on vot­ing, and de­cision the­ory.

I want your feed­back:

1. What kind of prob­lems have you en­countered that might be ad­ded to this list?

2. Do you have reas­ons to think that this pro­ject would be doomed to fail (or not)? If so, I want to pre­vent wast­ing time on it as fast as pos­sible. Des­pite hav­ing writ­ten this post, I don’t as­sign a high prob­ab­il­ity of suc­cess, but I’d like people’s views.