Realism about rationality

Link post

Epistemic sta­tus: try­ing to vaguely ges­ture at vague in­tu­itions. A similar idea was ex­plored here un­der the head­ing “the in­tel­ligi­bil­ity of in­tel­li­gence”, al­though I hadn’t seen it be­fore writ­ing this post.

There’s a mind­set which is com­mon in the ra­tio­nal­ist com­mu­nity, which I call “re­al­ism about ra­tio­nal­ity” (the name be­ing in­tended as a par­allel to moral re­al­ism). I feel like my skep­ti­cism about agent foun­da­tions re­search is closely tied to my skep­ti­cism about this mind­set, and so in this es­say I try to ar­tic­u­late what it is.

Hu­mans as­cribe prop­er­ties to en­tities in the world in or­der to de­scribe and pre­dict them. Here are three such prop­er­ties: “mo­men­tum”, “evolu­tion­ary fit­ness”, and “in­tel­li­gence”. Th­ese are all pretty use­ful prop­er­ties for high-level rea­son­ing in the fields of physics, biol­ogy and AI, re­spec­tively. There’s a key differ­ence be­tween the first two, though. Mo­men­tum is very amenable to for­mal­i­sa­tion: we can de­scribe it us­ing pre­cise equa­tions, and even prove things about it. Evolu­tion­ary fit­ness is the op­po­site: al­though noth­ing in biol­ogy makes sense with­out it, no biol­o­gist can take an or­ganism and write down a sim­ple equa­tion to define its fit­ness in terms of more ba­sic traits. This isn’t just be­cause biol­o­gists haven’t figured out that equa­tion yet. Rather, we have ex­cel­lent rea­sons to think that fit­ness is an in­cred­ibly com­pli­cated “func­tion” which ba­si­cally re­quires you to de­scribe that or­ganism’s en­tire phe­no­type, geno­type and en­vi­ron­ment.

In a nut­shell, then, re­al­ism about ra­tio­nal­ity is a mind­set in which rea­son­ing and in­tel­li­gence are more like mo­men­tum than like fit­ness. It’s a mind­set which makes the fol­low­ing ideas seem nat­u­ral:

  • The idea that there is a sim­ple yet pow­er­ful the­o­ret­i­cal frame­work which de­scribes hu­man in­tel­li­gence and/​or in­tel­li­gence in gen­eral. (I don’t count brute force ap­proaches like AIXI for the same rea­son I don’t con­sider physics a sim­ple yet pow­er­ful de­scrip­tion of biol­ogy).

  • The idea that there is an “ideal” de­ci­sion the­ory.

  • The idea that AGI will very likely be an “agent”.

  • The idea that Tur­ing ma­chines and Kol­mogorov com­plex­ity are foun­da­tional for episte­mol­ogy.

  • The idea that, given cer­tain ev­i­dence for a propo­si­tion, there’s an “ob­jec­tive” level of sub­jec­tive cre­dence which you should as­sign to it, even un­der com­pu­ta­tional con­straints.

  • The idea that Au­mann’s agree­ment the­o­rem is rele­vant to hu­mans.

  • The idea that moral­ity is quite like math­e­mat­ics, in that there are cer­tain types of moral rea­son­ing that are just cor­rect.

  • The idea that defin­ing co­her­ent ex­trap­o­lated vo­li­tion in terms of an ideal­ised pro­cess of re­flec­tion roughly makes sense, and that it con­verges in a way which doesn’t de­pend very much on morally ar­bi­trary fac­tors.

  • The idea that hav­ing hav­ing con­tra­dic­tory prefer­ences or be­liefs is re­ally bad, even when there’s no clear way that they’ll lead to bad con­se­quences (and you’re very good at avoid­ing dutch books and money pumps and so on).

To be clear, I am nei­ther claiming that re­al­ism about ra­tio­nal­ity makes peo­ple dog­matic about such ideas, nor claiming that they’re all false. In fact, from a his­tor­i­cal point of view I’m quite op­ti­mistic about us­ing maths to de­scribe things in gen­eral. But start­ing from that his­tor­i­cal baseline, I’m in­clined to ad­just down­wards on ques­tions re­lated to for­mal­is­ing in­tel­li­gent thought, whereas ra­tio­nal­ity re­al­ism would en­dorse ad­just­ing up­wards. This es­say is pri­mar­ily in­tended to ex­plain my po­si­tion, not jus­tify it, but one im­por­tant con­sid­er­a­tion for me is that in­tel­li­gence as im­ple­mented in hu­mans and an­i­mals is very messy, and so are our con­cepts and in­fer­ences, and so is the clos­est replica we have so far (in­tel­li­gence in neu­ral net­works). It’s true that “messy” hu­man in­tel­li­gence is able to gen­er­al­ise to a wide va­ri­ety of do­mains it hadn’t evolved to deal with, which sup­ports ra­tio­nal­ity re­al­ism, but analo­gously an an­i­mal can be evolu­tion­ar­ily fit in novel en­vi­ron­ments with­out im­ply­ing that fit­ness is eas­ily for­mal­is­able.

Another way of point­ing at ra­tio­nal­ity re­al­ism: sup­pose we model hu­mans as in­ter­nally-con­sis­tent agents with be­liefs and goals. This model is ob­vi­ously flawed, but also pre­dic­tively pow­er­ful on the level of our ev­ery­day lives. When we use this model to ex­trap­o­late much fur­ther (e.g. imag­in­ing a much smarter agent with the same be­liefs and goals), or base moral­ity on this model (e.g. prefer­ence util­i­tar­i­anism, CEV), is that more like us­ing New­to­nian physics to ap­prox­i­mate rel­a­tivity (works well, breaks down in edge cases) or more like cave­men us­ing their physics in­tu­itions to rea­son about space (a fun­da­men­tally flawed ap­proach)?

Another ges­ture to­wards the thing: a pop­u­lar metaphor for Kah­ne­man and Tver­sky’s dual pro­cess the­ory is a rider try­ing to con­trol an elephant. Im­plicit in this metaphor is the lo­cal­i­sa­tion of per­sonal iden­tity pri­mar­ily in the sys­tem 2 rider. Imag­ine re­vers­ing that, so that the ex­pe­rience and be­havi­our you iden­tify with are pri­mar­ily driven by your sys­tem 1, with a sys­tem 2 that is mostly a Han­so­nian ra­tio­nal­i­sa­tion en­g­ine on top (one which oc­ca­sion­ally also does use­ful maths). Does this shift your in­tu­itions about the ideas above, e.g. by mak­ing your CEV feel less well-defined? I claim that the lat­ter per­spec­tive is just as sen­si­ble as the former, and per­haps even more so—see, for ex­am­ple, Paul Chris­ti­ano’s model of the mind, which leads him to con­clude that “imag­in­ing con­scious de­liber­a­tion as fun­da­men­tal, rather than a product and in­put to re­flexes that ac­tu­ally drive be­hav­ior, seems likely to cause con­fu­sion.”

Th­ese ideas have been stew­ing in my mind for a while, but the im­me­di­ate trig­ger for this post was a con­ver­sa­tion about moral­ity which went along these lines:

R (me): Evolu­tion gave us a jum­ble of in­tu­itions, which might con­tra­dict when we ex­trap­o­late them. So it’s fine to ac­cept that our moral prefer­ences may con­tain some con­tra­dic­tions.
O (a friend): You can’t just ac­cept a con­tra­dic­tion! It’s like say­ing “I have an in­tu­ition that 51 is prime, so I’ll just ac­cept that as an ax­iom.”
R: Mo­ral­ity isn’t like maths. It’s more like hav­ing tastes in food, and then hav­ing prefer­ences that the tastes have cer­tain con­sis­tency prop­er­ties—but if your tastes are strong enough, you might just ig­nore some of those prefer­ences.
O: For me, my meta-level prefer­ences about the ways to rea­son about ethics (e.g. that you shouldn’t al­low con­tra­dic­tions) are so much stronger than my ob­ject-level prefer­ences that this wouldn’t hap­pen. Maybe you can ig­nore the fact that your prefer­ences con­tain a con­tra­dic­tion, but if we scaled you up to be much more in­tel­li­gent, run­ning on a brain or­ders of mag­ni­tude larger, hav­ing such a con­tra­dic­tion would break your thought pro­cesses.
R: Ac­tu­ally, I think a much smarter agent could still be weirdly mod­u­lar like hu­mans are, and work in such a way that de­scribing it as hav­ing “be­liefs” is still a very lossy ap­prox­i­ma­tion. And it’s plau­si­ble that there’s no canon­i­cal way to “scale me up”.

I had a lot of difficulty in figur­ing out what I ac­tu­ally meant dur­ing that con­ver­sa­tion, but I think a quick way to sum­marise the dis­agree­ment is that O is a ra­tio­nal­ity re­al­ist, and I’m not. This is not a prob­lem, per se: I’m happy that some peo­ple are already work­ing on AI safety from this mind­set, and I can imag­ine be­com­ing con­vinced that ra­tio­nal­ity re­al­ism is a more cor­rect mind­set than my own. But I think it’s a dis­tinc­tion worth keep­ing in mind, be­cause as­sump­tions baked into un­der­ly­ing wor­ld­views are of­ten difficult to no­tice, and also be­cause the ra­tio­nal­ity com­mu­nity has se­lec­tion effects favour­ing this par­tic­u­lar wor­ld­view even though it doesn’t nec­es­sar­ily fol­low from the com­mu­nity’s found­ing the­sis (that hu­mans can and should be more ra­tio­nal).