Economic Definition of Intelligence?

Fol­lowup to: Effi­cient Cross-Do­main Optimization

Shane Legg once pro­duced a cat­a­logue of 71 defi­ni­tions of in­tel­li­gence. Look­ing it over, you’ll find that the 18 defi­ni­tions in dic­tio­nar­ies and the 35 defi­ni­tions of psy­chol­o­gists are mere black boxes con­tain­ing hu­man parts.

How­ever, among the 18 defi­ni­tions from AI re­searchers, you can find such no­tions as

“In­tel­li­gence mea­sures an agent’s abil­ity to achieve goals in a wide range of en­vi­ron­ments” (Legg and Hut­ter)

or

“In­tel­li­gence is the abil­ity to op­ti­mally use limited re­sources—in­clud­ing time—to achieve goals” (Kurzweil)

or even

“In­tel­li­gence is the power to rapidly find an ad­e­quate solu­tion in what ap­pears a pri­ori (to ob­servers) to be an im­mense search space” (Le­nat and Feigen­baum)

which is about as close as you can get to my own no­tion of “effi­cient cross-do­main op­ti­miza­tion” with­out ac­tu­ally mea­sur­ing op­ti­miza­tion power in bits.

But Robin Han­son, whose AI back­ground we’re go­ing to ig­nore for a mo­ment in fa­vor of his bet­ter-known iden­tity as an economist, at once said:

“I think what you want is to think in terms of a pro­duc­tion func­tion, which de­scribes a sys­tem’s out­put on a par­tic­u­lar task as a func­tion of its var­i­ous in­puts and fea­tures.”

Economists spend a fair amount of their time mea­sur­ing things like pro­duc­tivity and effi­ciency. Might they have some­thing to say about how to mea­sure in­tel­li­gence in gen­er­al­ized cog­ni­tive sys­tems?

This is a real ques­tion, open to all economists. So I’m go­ing to quickly go over some of the crite­ria-of-a-good-defi­ni­tion that stand be­hind my own proffered sug­ges­tion on in­tel­li­gence, and what I see as the im­por­tant challenges to a pro­duc­tivity-based view. It seems to me that this is an im­por­tant sub-is­sue of Robin’s and my per­sis­tent dis­agree­ment about the Sin­gu­lar­ity.

(A) One of the crite­ria in­volved in a defi­ni­tion of in­tel­li­gence is that it ought to sep­a­rate form and func­tion. The Tur­ing Test fails this—it says that if you can build some­thing in­dis­t­in­guish­able from a bird, it must definitely fly, which is true but spec­tac­u­larly un­use­ful in build­ing an air­plane.

(B) We will also pre­fer quan­ti­ta­tive mea­sures to qual­i­ta­tive mea­sures that only say “this is in­tel­li­gent or not in­tel­li­gent”. Sure, you can define “flight” in terms of get­ting off the ground, but what you re­ally need is a way to quan­tify aero­dy­namic lift and re­late it to other prop­er­ties of the air­plane, so you can calcu­late how much lift is needed to get off the ground, and calcu­late how close you are to fly­ing at any given point.

(C) So why not use the nicely quan­tified IQ test? Well, imag­ine if the Wright Brothers had tried to build the Wright Flyer us­ing a no­tion of “flight qual­ity” build around a Fly-Q test stan­dard­ized on the abil­ities of the av­er­age pi­geon, in­clud­ing var­i­ous mea­sures of wingspan and air ma­neu­ver­abil­ity. We want a defi­ni­tion that is not parochial to hu­mans.

(D) We have a nice sys­tem of Bayesian ex­pected util­ity max­i­miza­tion. Why not say that any sys­tem’s “in­tel­li­gence” is just the av­er­age util­ity of the out­come it can achieve? But util­ity func­tions are in­var­i­ant up to a pos­i­tive af­fine trans­for­ma­tion, i.e., if you add 3 to all util­ities, or mul­ti­ply all by 5, it’s the same util­ity func­tion. If we as­sume a fixed util­ity func­tion, we would be able to com­pare the in­tel­li­gence of the same sys­tem on differ­ent oc­ca­sions—but we would like to be able to com­pare in­tel­li­gences with differ­ent util­ity func­tions.

(E) And by much the same to­ken, we would like our defi­ni­tion to let us rec­og­nize in­tel­li­gence by ob­ser­va­tion rather than pre­sump­tion, which means we can’t always start off as­sum­ing that some­thing has a fixed util­ity func­tion, or even any util­ity func­tion at all. We can have a prior over prob­a­ble util­ity func­tions, which as­signs a very low prob­a­bil­ity to over­com­pli­cated hy­pothe­ses like “the lot­tery wanted 6-39-45-46-48-36 to win on Oc­to­ber 28th, 2008”, but higher prob­a­bil­ities to sim­pler de­sires.

(F) Why not just mea­sure how well the in­tel­li­gence plays chess? But in real-life situ­a­tions, pluck­ing the op­po­nent’s queen off the board or shoot­ing the op­po­nent is not ille­gal, it is cre­ative. We would like our defi­ni­tion to re­spect the cre­ative short­cut—to not define in­tel­li­gence into the box of a nar­row prob­lem do­main.

(G) It would be nice if in­tel­li­gence were ac­tu­ally mea­surable us­ing some op­er­a­tional test, but this con­flicts strongly with crite­ria F and D. My own defi­ni­tion es­sen­tially tosses this out the win­dow—you can’t ac­tu­ally mea­sure op­ti­miza­tion power on any real-world prob­lem any more than you can com­pute the real-world prob­a­bil­ity up­date or max­i­mize real-world ex­pected util­ity. But, just as you can wisely wield al­gorithms that be­have sorta like Bayesian up­dates or in­crease ex­pected util­ity, there are all sorts of pos­si­ble meth­ods that can take a stab at mea­sur­ing op­ti­miza­tion power.

(H) And fi­nally, when all is said and done, we should be able to rec­og­nize very high “in­tel­li­gence” lev­els in an en­tity that can, oh, say, syn­the­size nan­otech­nol­ogy and build its own Dyson Sphere. Nor should we as­sign very high “in­tel­li­gence” lev­els to some­thing that couldn’t build a wooden wagon (even if it wanted to, and had hands). In­tel­li­gence should not be defined too far away from that im­pres­sive thingy we hu­mans some­times do.

Which brings us to pro­duc­tion func­tions. I think the main prob­lems here would lie in crite­ria DE.

First, a word of back­ground: In Ar­tifi­cial In­tel­li­gence, it’s more com­mon to spend your days ob­sess­ing over the struc­ture of a prob­lem space—and when you find a good al­gorithm, you use that al­gorithm and pay how­ever much com­put­ing power it re­quires. You aren’t as likely to find a situ­a­tion where there are five differ­ent al­gorithms com­pet­ing to solve a prob­lem and a sixth al­gorithm that has to de­cide where to in­vest a marginal unit of com­put­ing power. Not that com­puter sci­en­tists haven’t stud­ied this as a spe­cial­ized prob­lem. But it’s ul­ti­mately not what AIfolk do all day. So I hope that we can both try to ap­pre­ci­ate the dan­ger of de­for­ma­tion pro­fes­sionelle.

Robin Han­son said:

“Eliezer, even if you mea­sure out­put as you pro­pose in terms of a state space re­duc­tion fac­tor, my main point was that sim­ply ‘di­vid­ing by the re­sources used’ makes lit­tle sense.”

I agree that “di­vide by re­sources used” is a very naive method, rather tacked-on by com­par­i­son. If one mind gets 40 bits of op­ti­miza­tion us­ing a trillion float­ing-point op­er­a­tions, and an­other mind achieves 80 bits of op­ti­miza­tion us­ing two trillion float­ing-point op­er­a­tions, even in the same do­main us­ing the same util­ity func­tion, they may not at all be equally “well-de­signed” minds. One of the minds may it­self be a lot more “op­ti­mized” than the other (prob­a­bly the sec­ond one).

I do think that mea­sur­ing the rar­ity of equally good solu­tions in the search space smooths out the dis­cus­sion a lot. More than any other sim­ple mea­sure I can think of. You’re not just pre­sum­ing that 80 units are twice as good as 40 units, but try­ing to give some mea­sure of how rare 80-unit solu­tions are in the space; if they’re com­mon it will take less “op­ti­miza­tion power” to find them and we’ll be less im­pressed. This like­wise helps when com­par­ing minds with differ­ent prefer­ences.

But some search spaces are just eas­ier to search than oth­ers. I gen­er­ally choose to talk about this by hik­ing the “op­ti­miza­tion” met­ric up a meta-level: how easy is it to find an al­gorithm that searches this space? There’s no ab­solute eas­i­ness, un­less you talk about sim­ple ran­dom se­lec­tion, which I take as my base case. Even if a fit­ness gra­di­ent is smooth—a very sim­ple search—e.g. nat­u­ral se­lec­tion would creep down it by in­cre­men­tal neigh­bor­hood search, while a hu­man would leap through by e.g. look­ing at the first and sec­ond deriva­tives. Which of these is the “in­her­ent eas­i­ness” of the space?

Robin says:

Then we can talk about par­tial deriva­tives; rates at which out­put in­creases as a func­tion of changes in in­puts or fea­tures… Yes a pro­duc­tion func­tion for­mu­la­tion may ab­stract from some rele­vant de­tails, but it is far closer to re­al­ity than di­vid­ing by “re­sources.”

A par­tial deriva­tive di­vides the marginal out­put by marginal re­source. Is this so much less naive than di­vid­ing to­tal out­put by to­tal re­sources?

I con­fess that I said “di­vide by re­sources” just to have some mea­sure of effi­ciency; it’s not a very good mea­sure. Still, we need to take re­sources into ac­count some­how—we don’t want nat­u­ral se­lec­tion to look as “in­tel­li­gent” as hu­mans: hu­man en­g­ineers, given 3.85 billion years and the op­por­tu­nity to run 1e44 ex­per­i­ments, would pro­duce prod­ucts over­whelm­ingly su­pe­rior to biol­ogy.

But this is re­ally es­tab­lish­ing an or­der­ing based on su­pe­rior perfor­mance with the same re­sources, not a quan­ti­ta­tive met­ric. I might have to be con­tent with a par­tial or­der­ing among in­tel­li­gences, rather than be­ing able to quan­tify them. If so, one of the or­der­ing char­ac­ter­is­tics will be the amount of re­sources used, which is what I was get­ting at by say­ing “di­vide by to­tal re­sources”.

The idiom of “di­vi­sion” is based around things that can be di­vided, that is, fun­gible re­sources. A hu­man econ­omy based on mass pro­duc­tion has lots of these. In mod­ern-day com­put­ing work, pro­gram­mers use fun­gible re­sources like com­put­ing cy­cles and RAM, but tend to pro­duce much less fun­gible out­puts. In­for­ma­tional goods tend to be mostly non-fun­gible: two copies of the same file are worth around as much as one, so ev­ery worth­while in­for­ma­tional good is unique. If I draw on my mem­ory to pro­duce an es­say, nei­ther the sen­tences of the es­say, or the items of my mem­ory, will be sub­sti­tutable for one an­other. If I cre­ate a unique es­say by draw­ing upon a thou­sand unique mem­o­ries, how well have I done, and how much re­source have I used?

Economists have a sim­ple way of es­tab­lish­ing a kind of fun­gi­bil­ity-of-val­u­a­tion be­tween all the in­puts and all the out­puts of an econ­omy: they look at mar­ket prices.

But this just palms off the prob­lem of val­u­a­tion on hedge funds. Some­one has to do the valu­ing. A so­ciety with stupid hedge funds ends up with stupid val­u­a­tions.

Steve Omo­hun­dro has pointed out that for fun­gible re­sources in an AI—and com­put­ing power is a fun­gible re­source on mod­ern ar­chi­tec­tures—there ought to be a re­source bal­ance prin­ci­ple: the marginal re­sult of shift­ing a unit of re­source be­tween any two tasks should pro­duce a de­crease in ex­pected util­ity, rel­a­tive to the AI’s prob­a­bil­ity func­tion that de­ter­mines the ex­pec­ta­tion. To the ex­tent any of these things have con­tin­u­ous first deriva­tives, shift­ing an in­finites­i­mal unit of re­source be­tween any two tasks should have no effect on ex­pected util­ity. This es­tab­lishes “ex­pected utilons” as some­thing akin to a cen­tral cur­rency within the AI.

But this gets us back to the prob­lems of crite­ria D and E. If I look at a mind and see a cer­tain bal­ance of re­sources, is that be­cause the mind is re­ally clev­erly bal­anced, or be­cause the mind is stupid? If a mind would rather have two units of CPU than one unit of RAM (and how can I tell this by ob­ser­va­tion, since the re­sources are not read­ily con­vert­ible?) then is that be­cause RAM is in­her­ently twice as valuable as CPU, or be­cause the mind is twice as stupid in us­ing CPU as RAM?

If you can as­sume the re­source-bal­ance prin­ci­ple, then you will find it easy to talk about the rel­a­tive effi­ciency of al­ter­na­tive al­gorithms for use in­side the AI, but this doesn’t give you a good way to mea­sure the ex­ter­nal power of the whole AI.

Similarly, as­sum­ing a par­tic­u­lar rel­a­tive val­u­a­tion of re­sources, as given by an ex­ter­nal mar­ket­place, doesn’t let us ask ques­tions like “How smart is a hu­man econ­omy?” Now the rel­a­tive val­u­a­tion a hu­man econ­omy as­signs to in­ter­nal re­sources can no longer be taken for granted—a more pow­er­ful sys­tem might as­sign very differ­ent rel­a­tive val­ues to in­ter­nal re­sources.

I ad­mit that di­vid­ing op­ti­miza­tion power by “to­tal re­sources” is hand­wav­ing—more a qual­i­ta­tive way of say­ing “pay at­ten­tion to re­sources used” than any­thing you could ac­tu­ally quan­tify into a sin­gle use­ful figure. But I pose an open ques­tion to Robin (or any other economist) to ex­plain how pro­duc­tion the­ory can help us do bet­ter, bear­ing in mind that:

  • In­for­ma­tional in­puts and out­puts tend to be non-fun­gible;
  • I want to be able to ob­serve the “in­tel­li­gence” and “util­ity func­tion” of a whole sys­tem with­out start­ing out as­sum­ing them;

  • I would like to be able to com­pare, as much as pos­si­ble, the perfor­mance of in­tel­li­gences with differ­ent util­ity func­tions;

  • I can’t as­sume a pri­ori any par­tic­u­lar break­down of in­ter­nal tasks or “ideal” val­u­a­tion of in­ter­nal re­sources.

I would fi­nally point out that all data about the mar­ket value of hu­man IQ only ap­plies to var­i­ances of in­tel­li­gence within the hu­man species. I mean, how much would you pay a chim­panzee to run your hedge fund?