Critiquing “What failure looks like”

I find my­self some­what con­fused as to why I should find Part I of “What failure looks like” (here­after “WFLL1”, like the pas­try) likely enough to be worth wor­ry­ing about. I have 3 ba­sic ob­jec­tions, al­though I don’t claim that any are de­ci­sive. First, let me sum­ma­rize WFLL1 as I un­der­stand it:

In gen­eral, it’s eas­ier to op­ti­mize easy-to-mea­sure goals than hard-to-mea­sure ones, but this dis­par­ity is much larger with ML mod­els than with hu­mans and hu­man-made in­sti­tu­tions. As spe­cial-pur­pose AI be­comes more pow­er­ful, this will lead to a form of differ­en­tial progress where easy-to-mea­sure goals be­come op­ti­mized well past the point when they cor­re­lated with what we ac­tu­ally want.

(See also: this cri­tique, al­though I agree with the ex­ist­ing re­but­tals to it).

Ob­jec­tion 1: His­tor­i­cal precedent

In the late 1940s, Ge­orge Dantzig in­vented the sim­plex al­gorithm, a prac­ti­cally effi­cient method for solv­ing lin­ear op­ti­miza­tion prob­lems. At the same time, the first mod­ern com­put­ers were com­ing around, which he had ac­cess to as a math­e­mat­i­cian in the US mil­i­tary. For Dantzig and his con­tem­po­raries, a wide class of pre­vi­ously in­tractable prob­lems sud­denly be­came solv­able, and they did use the new meth­ods to great effect, play­ing a ma­jor part in de­vel­op­ing the field of op­er­a­tions re­search.

With the new tools in hand, Dantzig also de­cided to use sim­plex to op­ti­mize his diet. After care­fully por­ing over prior work, and putting in con­sid­er­able effort to ob­tain ac­cu­rate data and cor­rectly spec­ify the co­effi­cients, Dantzig was now ready, tel­ling his wife:

what­ever the [IBM] 701 says that’s what I want you to feed me each day start­ing with sup­per tonight.

The re­sult in­cluded 500 gal­lons of vine­gar.

After delist­ing vine­gar as a food, the next round came back with 200 boul­lion cubes/​day. There were sev­eral more iter­a­tions, none of which worked, and af­ter ev­ery­thing Dantzig sim­ply went with a “com­mon-sense” diet.

The point I am mak­ing is, when­ever we cre­ate new meth­ods for solv­ing prob­lems, we end up with a bunch of solu­tions look­ing for prob­lems. Typ­i­cally, we try to ap­ply those solu­tions as widely as pos­si­ble, and then quickly no­tice when some of those solu­tions don’t solve the prob­lems we ac­tu­ally want to solve.

Sup­pose that around 1950, we were mus­ing about the po­ten­tial con­se­quences of the com­ing IT rev­olu­tion. We might’ve no­ticed that we were en­ter­ing the era of the al­gorithm, where a po­ten­tially very wide class of prob­lems could be solved—if they could be re­duced to ar­ith­metic and run on the new ma­chines, with their scarcely fath­omable abil­ity to mem­o­rize a lot and calcu­late in mere mo­ments. And we could ask “But what about love, honor or jus­tice? Will we for­get about those un­quan­tifi­able things in the era of the al­gorithm?” [ex­cuse me if this phras­ing sounds snarky] And yet, in the decades since, we seem to have ba­si­cally just used com­put­ers to solve the prob­lems we ac­tu­ally want to solve, and we don’t seem to have stopped valu­ing the things that aren’t un­der their scope.

If we round off WFLL1 to “when you have a ham­mer, ev­ery­thing looks like a nail”, then this only seems mildly and be­nignly true in the case of most tech­nolo­gies, i.e. the trend seems to be that if tech­nol­ogy A makes us bet­ter at do­ing some class of tasks X, we poke around to see just how big X is, un­til we’ve delineated the bor­der well and stop, with the ex­plo­ra­tory phase rarely caus­ing large-scale harm.

I don’t think the OP is in­tend­ing WFLL1 to say some­thing this broad, but then I feel it should be clar­ified why “this time is differ­ent”, such as why mod­ern D(R)L should be fun­da­men­tally differ­ent from lin­ear op­ti­miza­tion, the IT rev­olu­tion, or even non-deep ML.

(I think the dis­con­ti­nu­ity-based ar­gu­ments largely do make the “this time is differ­ent” case, roughly be­cause gen­eral in­tel­li­gence seems clearly game-chang­ing. WFLL2 seems some­where in be­tween these, and I’m un­sure where my be­liefs fall on that.)

Ob­jec­tion 2: Ab­sence of evidence

I don’t see any par­tic­u­lar ev­i­dence of this as of WFLL1 un­fold­ing as we con­clude the 2010s. As I un­der­stand, it should grad­u­ally “show up” well be­fore AGI, but given that we already have a lot of ML already de­ployed, this at least causes one to ask when this should be ex­pected to be no­tice­able, in terms of the nec­es­sary ca­pa­bil­ities of the AI sys­tems.

Ob­jec­tion 3: Why priv­ilege this axis (of differ­en­tial progress)

It seems likely that if ML con­tinues to ad­vance sub­stan­tially over the com­ing decades (as much as the rate 2012-2019), then it will cause sub­stan­tial differ­en­tial progress. But along what axes? WFLL1 sin­gles out the axis “easy-to-mea­sure vs. hard-to-mea­sure”, and it’s not clear to me why we should worry about this in par­tic­u­lar.

For in­stance, there’s also the axis “have mas­sive datasets vs. don’t have mas­sive datasets”. And we could point to var­i­ous ex­am­ples of this form, e.g. it’s easy to mea­sure a coun­try’s GDP year over year, but we can get at most a few hun­dred data points on this, hence it’s com­pletely un­suit­able for DL. So, for in­stance, we could see differ­en­tial progress on microe­co­nomics vs. macroe­co­nomics.

More gen­er­ally, we could ask what things DL seems weak at:

  • Perfor­mance at the task must be easy to measure

  • A mas­sive, la­bel­led, digi­tized train­ing set must ex­ist (or can be eas­ily made with e.g. self-play)

  • DL seems rel­a­tively weak at learn­ing causality

  • (Other things listed by e.g. Gary Mar­cus)

And from there, we could rea­son­ably ex­trap­o­late to what DL will be good/​bad at, rel­a­tive to the baseline of hu­man think­ing/​heuris­tics.

WFLL1 seems to ba­si­cally say: “here’s this axis of differ­en­tial progress (aris­ing from a limi­ta­tion of DL), and here are some ex­am­ples of ways things can go wrong as a re­sult”. But for any other limi­ta­tion we list, I’d sus­pect we can also list ex­am­ples such as “if DL is re­ally ca­pa­ble in gen­eral but re­ally bad at causal mod­el­ing, here’s a thing that can go wrong.”

At least to me, the ease-of-mea­sure­ment bul­let point does not seem to pop out as a very nat­u­ral cat­e­gory: if in­ter­preted broadly, it does not cap­ture ev­ery­thing that seems plau­si­bly im­por­tant, and if in­ter­preted nar­rowly, it does not seem nar­row enough to fo­cus our at­ten­tion on any one in­ter­est­ing failure mode.