Critiquing “What failure looks like”

I find myself somewhat confused as to why I should find Part I of “What failure looks like” (hereafter “WFLL1”, like the pastry) likely enough to be worth worrying about. I have 3 basic objections, although I don’t claim that any are decisive. First, let me summarize WFLL1 as I understand it:

In general, it’s easier to optimize easy-to-measure goals than hard-to-measure ones, but this disparity is much larger with ML models than with humans and human-made institutions. As special-purpose AI becomes more powerful, this will lead to a form of differential progress where easy-to-measure goals become optimized well past the point when they correlated with what we actually want.

(See also: this critique, although I agree with the existing rebuttals to it).

Objection 1: Historical precedent

In the late 1940s, George Dantzig invented the simplex algorithm, a practically efficient method for solving linear optimization problems. At the same time, the first modern computers were coming around, which he had access to as a mathematician in the US military. For Dantzig and his contemporaries, a wide class of previously intractable problems suddenly became solvable, and they did use the new methods to great effect, playing a major part in developing the field of operations research.

With the new tools in hand, Dantzig also decided to use simplex to optimize his diet. After carefully poring over prior work, and putting in considerable effort to obtain accurate data and correctly specify the coefficients, Dantzig was now ready, telling his wife:

whatever the [IBM] 701 says that’s what I want you to feed me each day starting with supper tonight.

The result included 500 gallons of vinegar.

After delisting vinegar as a food, the next round came back with 200 boullion cubes/​day. There were several more iterations, none of which worked, and after everything Dantzig simply went with a “common-sense” diet.

The point I am making is, whenever we create new methods for solving problems, we end up with a bunch of solutions looking for problems. Typically, we try to apply those solutions as widely as possible, and then quickly notice when some of those solutions don’t solve the problems we actually want to solve.

Suppose that around 1950, we were musing about the potential consequences of the coming IT revolution. We might’ve noticed that we were entering the era of the algorithm, where a potentially very wide class of problems could be solved—if they could be reduced to arithmetic and run on the new machines, with their scarcely fathomable ability to memorize a lot and calculate in mere moments. And we could ask “But what about love, honor or justice? Will we forget about those unquantifiable things in the era of the algorithm?” [excuse me if this phrasing sounds snarky] And yet, in the decades since, we seem to have basically just used computers to solve the problems we actually want to solve, and we don’t seem to have stopped valuing the things that aren’t under their scope.

If we round off WFLL1 to “when you have a hammer, everything looks like a nail”, then this only seems mildly and benignly true in the case of most technologies, i.e. the trend seems to be that if technology A makes us better at doing some class of tasks X, we poke around to see just how big X is, until we’ve delineated the border well and stop, with the exploratory phase rarely causing large-scale harm.

I don’t think the OP is intending WFLL1 to say something this broad, but then I feel it should be clarified why “this time is different”, such as why modern D(R)L should be fundamentally different from linear optimization, the IT revolution, or even non-deep ML.

(I think the discontinuity-based arguments largely do make the “this time is different” case, roughly because general intelligence seems clearly game-changing. WFLL2 seems somewhere in between these, and I’m unsure where my beliefs fall on that.)

Objection 2: Absence of evidence

I don’t see any particular evidence of this as of WFLL1 unfolding as we conclude the 2010s. As I understand, it should gradually “show up” well before AGI, but given that we already have a lot of ML already deployed, this at least causes one to ask when this should be expected to be noticeable, in terms of the necessary capabilities of the AI systems.

Objection 3: Why privilege this axis (of differential progress)

It seems likely that if ML continues to advance substantially over the coming decades (as much as the rate 2012-2019), then it will cause substantial differential progress. But along what axes? WFLL1 singles out the axis “easy-to-measure vs. hard-to-measure”, and it’s not clear to me why we should worry about this in particular.

For instance, there’s also the axis “have massive datasets vs. don’t have massive datasets”. And we could point to various examples of this form, e.g. it’s easy to measure a country’s GDP year over year, but we can get at most a few hundred data points on this, hence it’s completely unsuitable for DL. So, for instance, we could see differential progress on microeconomics vs. macroeconomics.

More generally, we could ask what things DL seems weak at:

  • Performance at the task must be easy to measure

  • A massive, labelled, digitized training set must exist (or can be easily made with e.g. self-play)

  • DL seems relatively weak at learning causality

  • (Other things listed by e.g. Gary Marcus)

And from there, we could reasonably extrapolate to what DL will be good/​bad at, relative to the baseline of human thinking/​heuristics.

WFLL1 seems to basically say: “here’s this axis of differential progress (arising from a limitation of DL), and here are some examples of ways things can go wrong as a result”. But for any other limitation we list, I’d suspect we can also list examples such as “if DL is really capable in general but really bad at causal modeling, here’s a thing that can go wrong.”

At least to me, the ease-of-measurement bullet point does not seem to pop out as a very natural category: if interpreted broadly, it does not capture everything that seems plausibly important, and if interpreted narrowly, it does not seem narrow enough to focus our attention on any one interesting failure mode.