Finding the variables

In a pre­vi­ous post on bridg­ing syn­tax and se­man­tics, I men­tioned how to em­piri­cally es­tab­lish that the in­ter­nal sym­bols rep­re­sented the vari­ables in the en­vi­ron­ment: if the have high mu­tual in­for­ma­tion with the . This ba­si­cally ask whether you can find out about the val­ues of the by know­ing the . See also Luke Muelhauser’s men­tion of “rep­re­sen­ta­tion” and the ar­ti­cles linked therein.

At the end of that post, I men­tioned the prob­lem of find­ing the vari­ables if they were not given. This post will briefly look over that prob­lem, and the re­lated prob­lem of find­ing the .

Water­fall and vari­ables in the world

Given the in­ter­nal vari­able , it is al­most cer­tainly pos­si­ble to find a vari­able in the out­side world that cor­re­lates with it (even if we as­sume a Carte­sian sep­a­ra­tion be­tween the agent and the world, so we can’t just do the lazy thing and set ).

In the ex­am­ple of de­tect­ing an in­truder in a green­house, look at , the in­ter­nal vari­able of a guard that peers into the green­house to see an in­truder.

Then we can cer­tainly come up with a vari­able that cor­re­lates with . This could be a vari­able that cor­re­lates with whether there is an in­truder in the green­house in situ­a­tions where the guard can see it, and then cor­re­lates with all the is­sues that might fool the guard: man­nequins, delu­sion-in­duc­ing gases, in­trud­ers dis­guised as ta­bles, etc...

But we don’t even need to be any­thing like the vari­ables that was ‘sup­posed’ to mea­sure. If we have a chaotic sys­tem in the vicinity—say a nearby wa­ter­fall—then we can just list all the states of that sys­tem that hap­pen when vs those that hap­pen when , and set to be or in those states.

That is a var­i­ant of Scott Aaron­son’s wa­ter­fall ar­gu­ment: if you have enough va­ri­ety of states, and you can con­struct defi­ni­tions of ar­bi­trary com­plex­ity, then you can “ground” any model in these defi­ni­tions. To avoid this, we have to pe­nal­ise this defi­ni­tional com­plex­ity the defi­ni­tion is do­ing all the work here, and is it­self a highly com­pli­cated al­gorithm to im­ple­ment.

So pick the so that:

  • the com­plex­ity of defin­ing the is low, and

  • the have in­trin­si­cally rele­vant defi­ni­tions, defi­ni­tions that make sense with­out di­rect or in­di­rect knowl­edge of .

There are some edge cases of course—if a hu­man has be­ing their es­ti­mate of whether a swan is around, it might be use­ful to dis­t­in­guish be­tween there is a swan and there is a white swan, as this tells us whether the hu­man was con­cep­tu­al­is­ing black swans as swans. But in gen­eral, the should be defined by con­cepts that make sense on their own, and don’t take into ac­count.

Vari­ables in the mind

Now as­sume that the are some­thing rea­son­able. What of the ? Well, imag­ine a su­per­in­tel­li­gence had ac­cess to an agent’s en­tire sen­sory in­put. If the su­per­in­tel­li­gence had a de­cent world model, it could use that in­put to con­struct a best es­ti­mate as to the value of - and call that es­ti­mate, which is a func­tion of the in­ter­nal state of the agent, . Even if we limited the su­per­in­tel­li­gence to only ac­cess­ing some parts of the agent—maybe just the short term mem­ory, or the con­scious states—it could still con­struct an that is likely a far bet­ter cor­re­late of than any­thing the agent could con­struct/​nat­u­rally has ac­cess to.

For ex­am­ple, if were tem­per­a­ture (as in this post), then an AI could de­duce tem­per­a­ture in­for­ma­tion from hu­man sen­sory data much bet­ter than our sub­jec­tive “it feels kinda hot/​cold in here”.

So the should be se­lected ac­cord­ing to other crite­ria than cor­re­la­tion with . For al­gorithms, we could look at named vari­ables within them. For hu­mans, we could also look at vari­ables that cor­re­spond to names or la­bels (for ex­am­ple, when you ask a hu­man “are you feel­ing hot?”, what parts of the brain are trig­gered when that ques­tion is asked, and what parts cor­re­spond to the ar­tic­u­lated an­swer be­ing “yes”).

Un­less we are speci­fi­cally in­ter­ested in speech acts, we can’t just say ” cor­re­sponds to the hu­man an­swer­ing ‘yes’ when asked about how hot they feel”. Nev­er­the­less, when at­tempt­ing to define a “feel­ing of hot­ness” vari­able, we should be defin­ing it with all our knowl­edge (and the hu­man’s knowl­edge) of what that means: for ex­am­ple the fact that hu­mans of­ten an­swer ‘yes’ to that ques­tion when they in­deed do feel hot.

So the should be defined by tak­ing some con­cept and seek­ing to for­mal­ise how hu­mans use it/​im­ple­ment it, not by cor­re­lat­ing it with the .

We can some­times jus­tify a more cor­re­lated , if the con­cept is nat­u­ral for the hu­man in ques­tion. For ex­am­ple, we could take a hu­man and train them to es­ti­mate tem­per­a­ture. After a while, they will de­velop an in­ter­nal tem­per­a­ture es­ti­ma­tor which is more highly cor­re­lated with the tem­per­a­ture , but which cor­re­sponds nat­u­rally to some­thing the hu­man can con­sciously ac­cess; we could check this, by, for ex­am­ple, get­ting the hu­man to write down their tem­per­a­ture es­ti­mate.

We can also imag­ine the vari­able , which is an un­trained hu­man’s es­ti­mate of tem­per­a­ture; we’d ex­pect this to be a bit bet­ter than , just be­cause the hu­man can ex­plic­itly take into ac­count things like fever, or tem­per­a­ture ac­cli­ma­ti­sa­tion. But it’s not clear that is re­ally an in­trin­sic vari­able in the brain, or some­thing con­structed speci­fi­cally by the hu­man to an­swer that ques­tion at that mo­ment.

Things can get more murky if we al­low for un­con­scious feel­ings. Sup­pose some­one has a rel­a­tively ac­cu­rate gut in­stinct as to whether other peo­ple are trust­wor­thy, but barely makes use of that in­stinct con­sciously. Then it’s tricky to de­cide whether that in­stinct is a nat­u­ral in­ter­nal vari­able (which is highly cor­re­lated with trust­wor­thi­ness), or an in­put into the hu­man’s con­scious es­ti­mate (which is weakly cor­re­lated with trust­wor­thi­ness).

In­ves­ti­ga­tion, not optimisation

So this method is very suit­able for check­ing the cor­re­la­tions be­tween in­ter­nal vari­ables and ex­ter­nal ones, vari­ables that we have defined though some other pro­cess. So it can an­swer ques­tions like:

  • “Is a hu­man’s sub­jec­tive feel­ing of heat a good es­ti­mate of tem­per­a­ture?” (not re­ally).

  • “Is a trained hu­man’s tem­per­a­ture guess a good es­ti­mate of tem­per­a­ture?” (some­what).

  • “Is a hu­man’s sub­jec­tive feel­ing of there be­ing some­one else in the room a good es­ti­mate of the pres­ence of an in­truder”? (yes, very much so).

  • “Does this brain ac­tivity mean that the hu­man de­tects an in­truder?” (pos­si­bly).

But it all falls apart if we try and use the cor­re­la­tion as an op­ti­mi­sa­tion mea­sure, shift­ing to bet­ter mea­sure or vice-versa.