Fundamental Philosophical Problems Inherent in AI discourse

His­tor­i­cally AI had sev­eral win­ters, where hype bub­bles rose and col­lapsed. It is also full of pro­nounce­ments or pre­dic­tions such as

“We think that a sig­nifi­cant ad­vance can be made in one or more of these prob­lems if a care­fully se­lected group of sci­en­tists work on it to­gether for a sum­mer.”

There has also been an im­por­tant philo­soph­i­cal con­fu­sion that plagued AI in the 1970s. It has been dis­cussed in depth in a great text “When Ar­tifi­cial In­tel­li­gence Meets Nat­u­ral Stu­pidity”, as well as here. The con­fu­sion was around at­tempt­ing to rep­re­sent knowl­edge in di­a­grams like these:

| IS-A

An even more silly ex­am­ple of “knowl­edge”

| IS-A

From the pa­per:

Ø the ten­dency to see in nat­u­ral lan­guage a nat­u­ral source of prob­lems and solu­tions. Many re­searchers tend to talk as if an in­ter­nal knowl­edge rep­re­sen­ta­tion ought to be closely re­lated to the “cor­re­spond­ing” English sen­tences; and that op­er­a­tions on the struc­ture should re­sem­ble hu­man con­ver­sa­tion or “word prob­lems”.

This is a good ex­am­ple of how a se­ri­ous philo­soph­i­cal con­fu­sion about how much knowl­edge can be “con­tained in” a state­ment drove a large amount of re­search into sys­tems that later proved un­work­able.

What philo­soph­i­cal as­sump­tions is AI re­search fun­da­men­tally mis­taken about to­day?

Right now, the dom­i­nance and use­ful­ness of cur­rent deep learn­ing neu­ral net­works makes it seem that they cap­ture some key pieces of gen­eral rea­son­ing and could lead us to a more gen­eral in­tel­li­gence. Sarah Con­stantin dis­agrees. David Chap­man isn’t sure ei­ther.

I wish to high­light sev­eral other poli­tico-philo­soph­i­cal prob­lems that are in­her­ent in writ­ing about AI and AI safety. If the AI re­searchers of the past be­lieved that writ­ing “hap­piness is-a state of mind” lead to good rea­son­ing, there are likely similar is­sues that oc­cur to­day.

The cur­rent big prob­lems are:

1. Mi­sun­der­stand­ing of “bias”

2. Con­tra­dic­tory treat­ment of hu­man trustworthiness

3. Deep con­fu­sions about the na­ture of “val­ues”

4. Overem­pha­sized in­di­vi­d­u­al­ity and mi­s­un­der­stand­ing the na­ture of status

5. Over­re­li­ance on sym­bolic in­tel­li­gence, es­pe­cially in the moral realm

6. Punt­ing the prob­lems elsewhere

1. Hu­man Bias

The idea that hu­mans are “bi­ased” doesn’t sound too con­tro­ver­sial. That de­scrip­tion usu­ally arises when a per­son fails to fulfill a model of ra­tio­nal­ity or ethics that an­other per­son holds dear. There are many mod­els of ra­tio­nal­ity from which a hy­po­thet­i­cal hu­man can di­verge, such as VNM ra­tio­nal­ity of de­ci­sion mak­ing, Bayesian up­dat­ing of be­liefs, cer­tain de­ci­sion the­o­ries or util­i­tar­ian branches of ethics. The fact that many of them ex­ist should already be a red flag on any in­di­vi­d­ual model’s claim to “one true the­ory of ra­tio­nal­ity.”

How­ever, the us­age of the term “bias” in pop­u­lar cul­ture gen­er­ally in­di­cates some­thing else – a real or imag­ined crime of treat­ing peo­ple of differ­ent cat­e­gories in a man­ner not sup­ported by the dom­i­nant nar­ra­tive. As in “po­lice are bi­ased against a cer­tain group” or “tech work­ers are bi­ased against a cer­tain group.” In this case, the term “those hu­mans are bi­ased” is a syn­onym for a “this is a frontline along the cur­rent cul­ture war”, “those hu­mans are low sta­tus.” The ac­tual ques­tions of whether the said group’s is bi­ased or not or if their bias is the only causal ex­pla­na­tion for some effect gets you fired from your job. As a re­sult, a more ra­tio­nal dis­course on the na­ture of hu­man rea­son­ing’s re­la­tion­ship with the ideal can’t hap­pen in the tech in­dus­try.

This ques­tion has en­tered the AI dis­course both from the per­spec­tive of nar­row AI be­ing used to­day in the jus­tice sys­tem, a similar con­cern about po­ten­tial nar­row AI ap­pli­ca­tion in the fu­ture – see deep­mind pa­per as well as the dis­cus­sion of “hu­man val­ues”

As a Ja­cobin piece has pointed out, the no­tions of math­e­mat­i­cal bias, such as hav­ing a propen­sity of over or un­der-es­ti­mat­ing is com­pletely differ­ent from what the jour­nal­ists con­sider “bias.” A cri­tique of Pro Publica is not meant to be an en­dorse­ment of Bayesian jus­tice sys­tem, which is still a bad idea due to failing to pun­ish bad ac­tions in­stead of things cor­re­lated with bad ac­tions. The ques­tion of bias is be­ing used badly on both sides of the de­bate about AI in the jus­tice sys­tem. On one hand, the AI could be bi­ased, on the other hand it could be less bi­ased than the judges. The over-em­pha­sis of equal­ity as the only moral vari­able is, of course, ob­scur­ing the ques­tion of whether the jus­tice sys­tem is op­ti­mally fulfilling its real role as a de­ter­rent of fur­ther es­ca­la­tions of vi­o­lence.

Even as a sim­ple math­e­mat­i­cal con­struct, the no­tion of “bias” is over-em­pha­sized. Bias and var­i­ance are both sources of er­ror, but the nega­tive emo­tional ap­peal of “bias” has made peo­ple look for es­ti­ma­tors that min­i­mize it, in­stead of min­i­miz­ing over­all er­ror (see Prob­a­bil­ity, the Logic of Science, Chap­ter 17).

Ba­si­cally, there is a big differ­ence be­tween these con­cepts:

a) Ten­dency of an es­ti­ma­tor to sys­tem­at­i­cally over and un­der-es­ti­mate a variable

b) Ab­solute er­ror of a ma­chine learn­ing sys­tem

c) Hu­man er­ror in sim­ple math­e­mat­i­cal prob­lems

d) Hu­man sen­si­tivity to fram­ing effects and other cog­ni­tive bi­ases (cog­ni­tive fea­tures?)

e) Discrep­ancy be­tween re­vealed and stated prefer­ences, or sig­nal­ing vs acting

f) Diver­gence from a perfect model of a homo eco­nomics that may or may not have bear­ing on whether a par­tic­u­lar gov­ern­ment scheme is bet­ter than a mar­ket scheme

g) Gen­er­ally imag­ined er­ror at­tribut­ing di­verg­ing group out­comes to foul play (see James Damore)

h) Cor­rect un­der­stand­ing of differ­ences be­tween groups, such as IQ that is poli­ti­cally in­cor­rect to talk about (Charles Mur­ray)

i) Real er­ror of failing to model other minds cor­rectly

j) Gen­eral fa­voritism to­wards the one’s in­group

k) Gen­eral fa­voritism to­wards the one’s in­group ideas at the ex­pense of truth

l) Real or imag­ined er­ror of hu­mans act­ing against their per­ceived in­ter­est (how dare poor x race vot­ers vote for Trump?)

m) Real act of peo­ple op­pos­ing Power (or merely ex­ist­ing) and Power us­ing the per­ceived wrong of bias to sub­ju­gate peo­ple to hu­mil­i­at­ing rit­u­als. (Un­con­scious bias re­train­ing in the work­place)

Even though there is only a sin­gle word that could be used to de­scribe these, teay are differ­ent math­e­mat­i­cal con­structs and have to be treated differ­ently. The dan­ger of the last one pre­tend­ing to be any of the pre­vi­ous ones means that any no­tion of “peo­ple be­ing bi­ased” might be wrong since it might be in some­one’s in­ter­est to make other peo­ple seem less ra­tio­nal than they are.

Paul Chris­ti­ano has men­tioned that mod­el­ling “mis­takes” has high com­plex­ity:

Ø “I don’t think that writ­ing down a model of hu­man im­perfec­tions, which de­scribes how hu­mans de­part from the ra­tio­nal pur­suit of fixed goals, is likely to be any eas­ier than writ­ing down a com­plete model of hu­man be­hav­ior.

I think on the sur­face this state­ment is true, but also has fun­da­men­tally mis­taken as­sump­tion about what “mis­takes” even are or what goal they might serve. Peo­ple’s mis­takes are still always a product of some pro­cess, a pro­cess that “thinks” it is right. Whether there is a con­tra­dic­tory set of as­sump­tions about the world, two sub­agents within a per­son fight­ing each other, or two groups of hu­mans vy­ing for in­fluenc­ing the per­son, the no­tion of “mis­take” is still some­what rel­a­tive to which one is win­ning in the mo­ment. The Who-Whom of rea­son, so to speak. True Rec­on­cili­a­tion of sub-agent power strug­gles is a lot more com­plex than call­ing one of them a “mis­take.” “Ra­tional pur­suit of fixed goals” is also an un­likely model that you would want to force on a hu­man due to both in­side (con­flict­ing de­sires) and out­side (be­long­ing to a com­mu­nity) views of hu­man­ity.

Where does that leave us? The em­pha­sis on ob­ject level bias in dis­cus­sions about AI has eas­ily lead to a meta-level bias, where the AI re­searchers start out by as­sum­ing that other peo­ple are more wrong about their life or the world than they are. This is ex­tremely dan­ger­ous, as it pro­vides Yet Another Ex­cuse to ig­nore com­mon sense or to build in false as­sump­tions about rea­son­ing or causal­ity into an AI.

For ex­am­ple, the fol­low­ing ques­tion is dan­ger­ous to ask: where does sup­posed “bias” come from? Why did evolu­tion al­low it to ex­ist? Is it helpful ge­net­i­cally? While it’s easy to blame evolu­tion­ary adap­ta­tion, lack of brain power or adap­ta­tion ex­e­cu­tion, it’s a lit­tle harder to ad­mit that some­body’s else’s er­ror is not a pat­tern matcher mis­firing, but a pat­tern matcher work­ing cor­rectly.

This brings me to an­other po­ten­tial er­ror, which is:

2. Con­tra­dic­tory treat­ment of hu­man trustworthiness

This is clearly seen around the change of heart that Eliezer had around cor­rigi­bil­ity. In “AI as a pos­i­tive and nega­tive fac­tor in global risk,” the plan was to not al­low pro­gram­mers to in­terfere in the ac­tion of the AI once it got started. This was clearly changed in fol­low­ing years, as the de­signs be­gan to in­clude a large cor­rigi­bil­ity but­ton that based on the judge­ment of the pro­gram­mers would turn off the AI af­ter it started.

A com­plex is­sue arises here—what ex­act in­for­ma­tion does the pro­gram­mer need to con­sume to effec­tively perform the task or how to han­dle dis­agree­ments about whether to turn off the AI within the pro­gram­mers them­selves.

But, even more gen­er­ally, this brings two con­flict­ing points of view in how much we would trust peo­ple to in­terfere with the op­er­a­tion of an AI af­ter it has been cre­ated. The an­swers range from com­pletely to none.

The con­fu­sion is some­what un­der­stand­able, and we have analo­gies to­day that push us one way or an­other. We cer­tainly trust a calcu­la­tor to com­pute a large ar­ith­metic task more than we trust a calcu­la­tor pro­gram­mer to com­pute the same task. We still don’t trust a vir­tual as­sis­tant made to em­u­late some­one’s email re­sponses more than per­son it is em­u­lat­ing.

Paul Chris­ti­ano’s ca­pa­bil­ity am­plifi­ca­tion pro­posal also has the con­tra­dic­tion.

Ø “We don’t want such sys­tems to sim­ply imi­tate hu­man be­hav­ior — we want them to im­prove upon hu­man abil­ities. And we don’t want them to only take ac­tions that look good to hu­mans — we want them to im­prove upon hu­man judg­ment.”

What is the pos­i­tive vi­sion here? Is an AI mak­ing com­plex judge­ment calls and some­times be­ing over­ruled its con­trol­lers? Or is an AI rul­ing the world in a com­pletely un­ques­tioned man­ner?

On one hand, I un­der­stand the con­cern, if the AI only does ac­tions that “look good” to hu­mans, then it might be as good as a slimy poli­ti­cian. How­ever, if the AI does a lot of ac­tions that don’t look good to hu­mans, what can the hu­mans do to counter-act it? If the an­swer is “noth­ing,” this con­flicts with cor­rigi­bil­ity or abil­ity to im­prove on AI de­sign through ob­serv­ing nega­tive ob­ject-level ac­tions. Abil­ity to “im­prove” on hu­man judge­ment im­plies ac­tion with­out hu­man over­sight and lack of trust in hu­man. At the same time, ca­pa­bil­ity am­plifi­ca­tion gen­er­ally starts out with an as­sump­tion about hu­mans be­ing al­igned.

In other words, there prin­ci­ples are nearly im­pos­si­ble to satisfy to­gether:

a) AI can do ac­tions that don’t look good to its con­trol­lers, aka AI is able to im­prove on hu­man judgement

b) AI is cor­rigible and can be shut down and re­designed when it does ac­tions that don’t look good to its con­trol­lers.

This prob­lem is in some ways pre­sent even now, as par­tic­u­lar pro­pos­als, such as IRL could be re­jected based claims that they em­u­late hu­mans too well and hu­mans can’t be trusted.

For ex­am­ple, ear­lier in this post, there is also this tid­bit.

Ø If you were to find any prin­ci­pled an­swer to “what is the hu­man brain op­ti­miz­ing?” the sin­gle most likely bet is prob­a­bly some­thing like “re­pro­duc­tive suc­cess.” But this isn’t the an­swer we are look­ing for.

Why not? What is the philo­soph­i­cal ob­jec­tion to con­sid­er­ing the plau­si­ble out­come of many of the al­gorithms that one is con­sid­er­ing, such as IRL or even ca­pa­bil­ity-am­plified hu­man (which would then pre­sum­ably be bet­ter at “re­pro­duc­tive suc­cess”). Is it that “re­pro­duc­tive suc­cess” is a low sta­tus goal? Is that we are sup­posed to pre­tend to care about other things while still op­ti­miz­ing for it? We are for­bid­den to see the fun­da­men­tal drives due to their abil­ity to cause more mimetic con­flict? Is “re­pro­duc­tive suc­cess” un­sup­ported as a goal by the cur­rent Power struc­tures? None of these seem com­pel­ling enough rea­sons to re­ject it as a po­ten­tial out­come of al­gorithms fun­da­men­tally un­der­stand­ing hu­mans or to re­ject al­gorithms that have a chance of work­ing just be­cause some peo­ple are un­com­fortable with the evolu­tion­ary na­ture of hu­mans.

The point isn’t to ar­gue for max­i­miza­tion of “re­pro­duc­tive suc­cess”. Hu­man val­ues are a bit more com­plex than that. The point is that it’s philo­soph­i­cally dan­ger­ous to think “we are go­ing to re­ject al­gorithms that work on un­der­stand­ing hu­mans be­cause they work and ac­cept­ing re­al­ities of evolu­tion is seen as low sta­tus.”

3. Speak­ing of hu­man “val­ues”

If there is one ab­strac­tion that is in­cred­ibly leaky and likely caus­ing a ton of con­fu­sion is that of “hu­man val­ues.” The first ques­tion—what math­e­mat­i­cal type of thing are they? If you con­sider AI util­ity func­tions, then a “hu­man value” would be some sort nu­meric eval­u­a­tion of world states or world state his­to­ries. Things that ap­proach this cat­e­gory is some­thing like “he­do­nic util­i­tar­i­anism.” How­ever, this is not what is col­lo­quially meant by “val­ues”, a cat­e­gory which has a hugely broad range.

We then have 4 sep­a­rate things that are meant by “val­ues” in to­day’s culture

a) Col­lo­quial /​ poli­ti­cal us­age “fam­ily val­ues” /​ “west­ern val­ues”

b) Cor­po­rate us­age “move fast and break things”, “be right”

c) Psy­cholog­i­cal us­age – moral foun­da­tions the­ory (care, au­thor­ity, liberty, fair­ness, in­group loy­alty, pu­rity)

d) AI /​ philo­soph­i­cal us­age ex­am­ple—“he­do­nic util­i­tar­i­anism” – gen­er­ally a thing that *can* be maximized

There are other sub­di­vi­sions here as well. Even within “cor­po­rate val­ues”, there is lack of agree­ment on whether they are sup­posed to be falsifi­able or not.

As far as the moral foun­da­tions, they are not nec­es­sar­ily a full cap­ture of hu­man de­sires or even a nec­es­sar­ily a great start­ing point from which to try and de­rive al­gorithms. How­ever, given that they do cap­ture *some* arch-typ­i­cal con­cerns of many peo­ple, it’s worth check­ing how any AI de­signs take them into ac­count. It seems that they are some­what ig­nored be­cause they don’t feel like the nec­es­sary type sig­na­ture. Loy­alty and Author­ity may seem like strongly held heuris­tics that en­able co­or­di­na­tion be­tween hu­mans to­wards what­ever other goals hu­mans might want. So, for ex­am­ple, not ev­ery­body in the tribe needs to be a util­i­tar­ian, if the leader of a tribe is a util­i­tar­ian and the sub­jects are loyal and fol­low his au­thor­ity. This means that ac­cep­tance of the leader as the leader is more key fact of the per­son’s moral out­look com­pared to their day to day ac­tivity with­out any­one else’s in­volve­ment. Loy­alty to ones in­group is *of course* com­pletely ig­nored in nar­row AI de­signs. The vi­sion that is al­most uni­ver­sally pa­raded is that of “bias be­ing bad,” bias needs to be fac­tored out of hu­man judge­ment by any means nec­es­sary.

There is, of course a le­gi­t­i­mate con­flict be­tween fair­ness to all and loy­alty to the in­group and it is al­most com­pletely re­solved in fa­vor of real or fake fair­ness. The meta-de­ci­sion, by which this is re­solved, is poli­ti­cal fiat, rather than philo­soph­i­cal de­bate or math­e­mat­i­cal rec­on­cili­a­tion based on higher meta-prin­ci­ples. You might read my pre­vi­ous state­ments about moral foun­da­tions and won­der – how do you max­i­mize the pu­rity foun­da­tion? The point isn’t to max­i­mize ev­ery facet of hu­man psy­chol­ogy, the point is to hope that one’s de­sign does not code fun­da­men­tally false as­sump­tions about how hu­mans op­er­ate.

There is a broader and more com­plex point here. “Values” as ex­pressed in sim­ple words such as loy­alty and liberty are si­mul­ta­neously ex­pres­sions of “what we truly want” as well as tools of poli­ti­cal and group con­flict. I don’t want to go full post-mod­ern de­con­struc­tion here and claim that lan­guage has no truth value, only tribal sig­nal­ing value, how­ever the unity of groups around ide­olog­i­cal points ob­vi­ously hap­pens to­day. This cre­ates a whole new set of prob­lems for AIs try­ing to learn and ag­gre­gate hu­man val­ues among groups. It seems that learn­ing val­ues /​ world his­tory eval­u­a­tions of a sin­gle hu­man is the hard part and once you have your won­der­ful sin­gle-per­son util­ity func­tion, then sim­ply av­er­ag­ing the hu­man prefer­ences works. The re­al­iza­tion that val­ues are nec­es­sar­ily both re­flec­tions of uni­ver­sal de­sires AND ad­ver­sar­ial group re­la­tions could cre­ate a lot of weird can­cel­ling out once the “av­er­ag­ing” starts. There are other prob­lems. One is in­ten­sify­ing ex­ist­ing group con­flict, in­clud­ing con­flict be­tween coun­tries over the con­trol of the AI. Another is mis-un­der­stand­ing the na­ture of how the cul­tural norms of “good” /​ “bad” evolve. Group and in­sti­tu­tional con­flict forces some val­ues to be more preferred than oth­ers, while oth­ers are sup­pressed. Allow­ing for “moral progress” in AI is a already strange term, since in some peo­ple’s minds it stands for sup­press­ing the out­group’s hu­mans as hu­mans more and more.

This could be avoided if you fo­cus more on in­ter­sec­tion of val­ues or hu­man uni­ver­sals in­stead of try­ing to ei­ther hard-code liberal as­sump­tions. In­ter­sec­tion has a po­ten­tially lower chance of caus­ing ear­lier fights over the con­trol of the AI com­pared to the union. That solu­tion has prob­lems of its own. Hu­man “uni­ver­sals” are still sub­ject to the time frame you are con­sid­er­ing. In­ter­sec­tions could be too small to be rele­vant, some cul­tures are, in fact, bet­ter than oth­ers, etc. The could be ge­netic com­po­nents that al­ter the ideal struc­ture of so­ciety, which means that “value clusters” might be work­able based on the var­i­ous ge­netic make-ups of the peo­ple. Even with­out that, there might be some small “wig­gle-room” in val­ues, even though there are defini­tive con­straints on what cul­ture any group of hu­mans could plau­si­bly adopt and still main­tain a func­tional civ­i­liza­tion.

There is not a sim­ple solu­tion, but it always needs to start with a more thor­ough causal model of how power dy­nam­ics af­fect moral­ity and vice versa.

At the end of the day, the hard work of value philos­o­phy rec­on­cili­a­tion might still need to be done by hy­per-com­pe­tent hu­man philoso­phers.

It’s a bit sur­real to read the AI pa­pers with these prob­lems in mind. Let’s learn hu­man val­ues, ex­cept we con­sider hu­mans bi­ased, so we can’t learn loy­alty, we over-em­pha­size in­di­vi­d­u­al­ism, so we can’t re­ally learn “au­thor­ity.” We don’t un­der­stand how to rec­on­cile de­sire to not be told what to do with our tech­no­cratic de­signs, we so can’t learn “liberty”. “Re­pro­duc­tive suc­cess” is dis­missed with­out ex­pla­na­tion. I won­der if they are dis­guised by the pu­rity foun­da­tion as well.

The last con­fu­sion that is im­plied by the word “val­ues” is that hu­mans are best ap­prox­i­mated by ra­tio­nal util­ity max­i­miz­ers, where there is some hid­den util­ity func­tion “in­side them.” The prob­lem isn’t that this a “ra­tio­nal val­ues max­i­mizer” is that ter­rible of a model for some EAs who are re­ally ded­i­cated to do EA, but it misses most peo­ple who are nei­ther ide­olog­i­cally driven, not that time con­sis­tent, who ex­pect them­selves to evolve sig­nifi­cantly or de­pend on other for their moral com­pass. It also fails to cap­ture peo­ple who are more “meta-ra­tio­nal” or “post-ide­olog­i­cal” and who are less keen to pur­sue cer­tain val­ues due to ad­ver­sar­ial na­ture of them. In other words, peo­ple at var­i­ous stages of emo­tional and eth­i­cal ma­tu­rity might need a differ­ent math­e­mat­i­cal treat­ment of what they con­sider valuable.

Th­ese are not merely the­o­ret­i­cal prob­lems, it seems that the ac­tual tech in­dus­try, in­clud­ing Open AI has adopted a similar “val­ues” lan­guage. In an ap­pro­pri­ately Or­wellian fash­ion the AI rep­re­sents the “val­ues” of the de­vel­op­ers, which thus re­quires “di­ver­sity.” The AI rep­re­sent­ing the val­ues of peo­ple with the power over the de­vel­op­ers is prob­a­bly closer to the truth, of course. The same “di­ver­sity” which is sup­posed to in the­ory rep­re­sent the di­ver­sity of ideas is then taken to mean ab­sence of whites and Asi­ans. There are many prob­lems here, but the con­fu­sion around the na­ture of “val­ues” is cer­tainly not helping.

4) Overem­pha­sized in­di­vi­d­u­al­ity and mi­s­un­der­stand­ing the na­ture of status

A key con­fu­sion that I al­luded to in the dis­cus­sion of au­thor­ity is the prob­lem of learn­ing about hu­man prefer­ences from an in­di­vi­d­ual hu­man iso­lated from so­ciety. This as­sump­tion is pre­sent nearly ev­ery­where, but it gets even worse with things like “ca­pa­bil­ity am­plifi­ca­tion” where “a hu­man think­ing for an hour” is taken to rep­re­sent a self-con­tained unit of in­tel­li­gence.

The is­sue is that hu­mans are so­cial an­i­mals. In fact, the phrase “so­cial an­i­mals” vastly un­der-es­ti­mates the so­cial na­ture of cog­ni­tion and eval­u­a­tion of “good­ness.” On one hand peo­ple want things *be­cause* other peo­ple want them. Se­condly, peo­ple clearly want sta­tus, which is a place in the mind of the Other. Last, but not least, peo­ple fol­low oth­ers with power, which means that they are im­ple­ment­ing an ethic or an aes­thetic, which is not nec­es­sar­ily their own in iso­la­tion. This doesn’t only mean only fol­low­ing oth­ers with high amounts of power. It’s just that the ex­pec­ta­tions that peo­ple we care about have of us shape a large amount of our be­hav­ior. The last point is not a bad thing most of the time, since many peo­ple go crazy in iso­la­tion and thus the ques­tion of “what are their val­ues in iso­la­tion?” is not a great start­ing point. This feeds into the ques­tion of bias, as peo­ple might be an ir­ra­tional part of more ra­tio­nal su­per-sys­tem.

What does this have to do with AI de­signs? Any de­sign that at it’s heart wishes to benefit hu­man­ity must be able to dis­t­in­guish be­tween:

a) I want the bag (make more bags)

b) I want the feel­ing of be­ing-bag-owner (make more drugs that simu­late that feel­ing)

c) I want the ad­mira­tion of oth­ers that comes with hav­ing the bag, whether or not I know about it (make ad­mira­tion max­i­miz­ers)

d) I want to ap­pease pow­er­ful in­ter­ests that wish for me to own the bag (max­i­mize pos­i­tive feel­ings for those with power over me)

e) Other pos­si­bil­ities, in­clud­ing what “you” re­ally want

You can imag­ine that there can be some amount of rec­on­cil­ing some of those into a co­her­ent vi­sion, but this is prone to cre­at­ing a wrong-thing max­i­mizer.

Failure modes of max­i­miz­ing these are very rem­i­nis­cent of gen­eral prob­lems of moder­nity. Too much stuff, too many drugs, too much so­cial me­dia, etc.

At the end of the day, hu­man be­hav­ior, in­clud­ing think­ing, does not make that much sense out­side the so­cial con­text. This prob­lem is helped but isn’t sim­ply solved by ob­serv­ing hu­mans in the nat­u­ral en­vi­ron­ment, in­stead of clean-rooms.

5) Over­re­li­ance on sym­bolic in­tel­li­gence, es­pe­cially in the moral realm

This prob­lem is ex­actly analo­gous to the prob­lem refer­enced in the in­tro­duc­tion. Estab­lish­ing “is-a” re­la­tion­ships or clas­sify­ing con­cepts into other con­cep­tual buck­ets is a part of the hu­man brain does. How­ever, it’s not a com­plete model of in­tel­li­gence. While the as­sump­tion that a sys­tem em­u­lat­ing this would lead to a com­plete model of the in­tel­li­gence is no longer pre­sent, parts of the mis-un­der­stand­ing re­main.

AI al­ign­ment through de­bate has this is­sue. From Open AI

Ø To achieve this, we re­frame the learn­ing prob­lem as a game played be­tween two agents, where the agents have an ar­gu­ment with each other and the hu­man judges the ex­change. Even if the agents have a more ad­vanced un­der­stand­ing of the prob­lem than the hu­man, the hu­man may be able to judge which agent has the bet­ter ar­gu­ment (similar to ex­pert wit­nesses ar­gu­ing to con­vince a jury).

Ø Our hope is that, prop­erly trained, such agents can pro­duce value-al­igned be­hav­ior far be­yond the ca­pa­bil­ities of the hu­man judge

I un­der­stand the ra­tio­nale here, of course. The hu­man needs to have in­for­ma­tion about the in­ter­nal func­tion­ing of an AI pre­sented in a for­mat that the hu­man un­der­stands. This is a de­sir­able prop­erty and is worth re­search­ing. How­ever, the prob­lem comes not in ver­ify­ing that the ar­gu­ment the AI has is cor­rect. The prob­lem comes in mak­ing sure the AI’s words and ac­tions cor­re­spond to each other. Imag­ine the old school ex­pert sys­tem that has a “hap­piness is-a state-of mind” node hooked up to con­struct­ing ar­gu­ments. It could po­ten­tially cre­ate state­ments that the hu­man might agree with. The prob­lem is that the state­ment does not cor­re­spond to any­thing, let alone any po­ten­tial real-world ac­tions.

How does one check that the AI’s words and its ac­tions are con­nected to re­al­ity in a con­sis­tent man­ner? You would have to be able to have a hu­man look at which sig­nals the AI is send­ing and con­sider the con­se­quences of its ac­tions and then won­der if the cor­re­spon­dence is good enough to not. How­ever, there is no rea­son to be­lieve that this is much eas­ier than pro­gram­mers try­ing to plot AI in­ter­nals through other means such as data vi­su­al­iza­tion or do­ing com­plex real-world tests.

We have this prob­lem with poli­ti­ci­ans and me­dia to­day.

This also has an ex­tra level of difficulty due to defi­ni­tions of words and their con­nec­tion to re­al­ity be­ing poli­ti­cal vari­ables to­day.

For ex­am­ple, what would hap­pen if in­stead of cats and dogs, OpenAI made a game clas­sify­ing pic­tures of men vs women?

Even that would be prob­a­bly raise eye-brows, but there can be other ex­am­ples…What about ideas about what is an “ap­pro­pri­ate” prom dress? What about whether an idea needs to be pro­tected by free speech? Th­ese are many ex­am­ples, where “truth” of a state­ment has a poli­ti­cal char­ac­ter about it. It’s be­com­ing harder and harder to find things that don’t. Once again, I don’t want to go full de­con­struc­tion here and claim that “truth” is only so­cially de­ter­mined. There are clearly right and wrong state­ments, or state­ments more or less en­tan­gled with re­al­ity. How­ever, the key point is that most of the cog­ni­tive work hap­pens out­side of the ma­nipu­la­tion of ver­bal ar­gu­ments and in­stead it comes in both pick­ing the right con­cepts and ver­ify­ing that they don’t ring hol­low.

There is also a gen­eral philo­soph­i­cal ten­dency to re­place ques­tions of what real meta-ethics would be with ques­tions about “state­ments about ethics.” The meta-eth­i­cal di­vide be­tween moral re­al­ism and other meta-eth­i­cal the­o­ries more fre­quently con­cerns state­ments, in­stead of plau­si­ble de­scrip­tions of how peo­ple’s think­ing about ethics evolves over time, his­tor­i­cal causes, or even ques­tions like “would differ­ent cul­ture’s pro­gram­mers agree on whether a par­tic­u­lar al­gorithm ex­hibits “suffer­ing” or not.” Wei Dai made a similar point well here.

The gen­eral point is that there is a strong philo­soph­i­cal ten­dency to tie too much in­tel­li­gence /​ethics with abil­ity to make cer­tain ver­bal state­ments, or wor­shiping memes in­stead real-world peo­ple. Mi­sun­der­stand­ing of this prob­lem is es­pe­cially likely to oc­cur since it is ex­tremely similar to the types of prob­lems mi­s­un­der­stood in the past.

6) Punt­ing the prob­lems elsewhere

Punt­ing the prob­lem is a form of “giv­ing up” with­out ex­plic­itly say­ing that you are giv­ing up. There are two gen­eral pat­terns here:

a) Giv­ing the prob­lem to some­one else (let’s find other peo­ple and con­vince them AI safety is im­por­tant /​ what if the gov­ern­ment reg­u­lated AI /​ it’s im­por­tant to have a con­ver­sa­tion)

b) Bit­ing the bul­let on nega­tive out­comes (peo­ple de­serve to die /​ un­al­igned AI might not be that bad)

c) Punt­ing too many things to the AI it­self (cre­ate a Seed AI that solves philos­o­phy)

Of all the is­sues here, this one is, in some ways, the most for­giv­able. If a reg­u­lar per­son is given the is­sue of AI safety, it is more cor­rect for them to re­al­ize that it is too difficult and to punt on it, rather than try and “solve it” with­out back­ground and get the wrong im­pres­sion of its sim­plic­ity. So, punt­ing on the prob­lem is a wrong epistemic move, but the right strate­gic move.

That said, the meta-strat­egy of con­vinc­ing oth­ers AI safety is plau­si­ble, but still needs to ter­mi­nate in the pro­duc­tion ac­tual philos­o­phy /​ math­e­mat­ics at some point. Pur­su­ing the meta-strat­egy thus can­not come at the ex­pense of it.

Similarly, the gov­ern­ment could, in the­ory, be helpful in reg­u­lat­ing AI, how­ever, in prac­tice, it’s more im­por­tant to work out what that policy would be be­fore ask­ing for blan­ket reg­u­la­tions, which would have difficulty strad­dling be­tween ex­tremes of over-reg­u­la­tion and do­ing prac­ti­cally noth­ing.

So, this is for­giv­able for some peo­ple, how­ever it’s less for­giv­able to the key play­ers. If a CEO says that “we need to have a dis­cus­sion about the is­sue of AI,” in­stead of hav­ing that dis­cus­sion, it’s a form of pro­cras­ti­nat­ing on the prob­lem by go­ing meta.

Bit­ing the bul­let on nega­tive out­comes is not some­thing I see very of­ten in pa­pers, but oc­ca­sion­ally you get strange re­sponses from tech lead­ers, such as “Us­ing Face­book pas­sively is as­so­ci­ated with men­tal health risks” with­out fur­ther fol­low-up on whether the in­ter­nal op­ti­miza­tion is, in fact, driv­ing peo­ple to use the site more pas­sively or not.

Punt­ing too many things to the AI it­self is a prob­lem ba­si­cally in most ap­proaches. If you have two tasks, such as “solve philos­o­phy” and “cre­ate an al­gorithm that solves philos­o­phy,” the sec­ond task is not nec­es­sar­ily eas­ier than the first. To solve it, you still need ways to ver­ify that it has done so cor­rectly, which re­quires at least be­ing able to rec­og­nize good an­swers in this do­main.

Say­ing al­gorithm “X is able to be gen­er­ally-al­igned”, where X is any al­gorithm re­quires for you to have a stronger level of cer­tainty in the abil­ity of the al­gorithm to pro­duce cor­rect an­swers than in your own judge­ment of the eval­u­a­tion of some philo­soph­i­cal puz­zle. Of course, this is pos­si­ble to do with ar­ith­metic or play­ing go, so peo­ple analo­gize for AI to rea­son bet­ter in other do­mains as well. How­ever, the ques­tion of what hap­pens when at some point it’s likely to pro­duce an an­swer one doesn’t like. How does one ac­quire enough cer­tainty about the pre­vi­ous cor­rect­ness of the al­gorithm for them to be con­vinced?

In con­clu­sion, the philo­soph­i­cal prob­lems of AI are se­ri­ous, and it is un­likely that will be eas­ily solved “by de­fault” or with a par­tially al­igned AI. For ex­am­ple, it’s hard to imag­ine that iter­at­ing on an ex­pert sys­tem that pro­duces “hap­piness is-a state of mind” state­ments to get it to a neu­ral net­work like rea­son­ing with­out drop­ping a fun­da­men­tal as­sump­tion about how much in­tel­li­gence hap­pens in ver­bal state­ments. Similarly iter­at­ing on a de­sign that as­sumes too much about how much ethics is re­lated to be­lief or pro­nounce­ments is un­likely to pro­duce real al­gorithms that com­pute the thing we want to com­pute.

To be able to cor­rectly in­fer true hu­man wants we need to be able to cor­rectly model hu­man be­hav­ior, which in­clude un­der­stand­ing many po­ten­tially un­sa­vory the­o­ries of dom­i­nance hi­er­ar­chies, mimetic de­sire, scape­goat­ing, the­o­ries of the sa­cred, en­light­en­ment, ge­netic vari­abil­ity, com­plex­ities of adult eth­i­cal de­vel­op­ment as well as how “things that are for­bid­den to see” arise.

The good news is that there is a lot of ex­ist­ing philos­o­phy that can be used to help at least check one’s de­signs against. Girard, Hei­deg­ger or Wittgen­stein could be good peo­ple to re­view. The bad news is that in the mod­ern world the liberal doc­trines of in­di­vi­d­u­al­ism and fears of in­equity have crowded out all other eth­i­cal con­cerns and the cur­rent cul­ture war has stunted the de­vel­op­ment of com­plex ideas.

How­ever, it’s kind of pointless to be afraid of speak­ing up against the philo­soph­i­cal prob­lems of AI. We are prob­a­bly dead if we don’t.

No nominations.
No reviews.