The AI in Mary’s room

In the Mary’s room thought ex­per­i­ment, Mary is a brilli­ant sci­en­tist in a black-and-white room who has never seen any colour. She can in­ves­ti­gate the out­side world through a black-and-white tele­vi­sion, and has piles of text­books on physics, op­tics, the eye, and the brain (and ev­ery­thing else of rele­vance to her con­di­tion). Through this she knows ev­ery­thing in­tel­lec­tu­ally there is to know about colours and how hu­mans re­act to them, but she hasn’t seen any colours at all.

After that, when she steps out of the room and sees red (or blue), does she learn any­thing? It seems that she does. Even if she doesn’t tech­ni­cally learn some­thing, she ex­pe­riences things she hadn’t ever be­fore, and her brain cer­tainly changes in new ways.

The ar­gu­ment was in­tended as a defence of qualia against cer­tain forms of ma­te­ri­al­ism. It’s in­ter­est­ing, and I don’t in­tent to solve it fully here. But just like I ex­tended Searle’s Chi­nese room ar­gu­ment from the per­spec­tive of an AI, it seems this ar­gu­ment can also be con­sid­ered from an AI’s per­spec­tive.

Con­sider a RL agent with a re­ward chan­nel, but which cur­rently re­ceives noth­ing from that chan­nel. The agent can know ev­ery­thing there is to know about it­self and the world. It can know about all sorts of other RL agents, and their re­ward chan­nels. It can ob­serve them get­ting their own re­wards. Maybe it could even in­ter­rupt or in­crease their re­wards. But, all this knowl­edge will not get it any re­ward. As long as its own chan­nel doesn’t send it the sig­nal, knowl­edge of other agents re­wards—even of iden­ti­cal agents get­ting re­wards—does not give this agent any re­ward. Ceci n’est pas une ré­com­pense.

This seems to mir­ror Mary’s situ­a­tion quite well—know­ing ev­ery­thing about the world is no sub­sti­tute from ac­tu­ally get­ting the re­ward/​see­ing red. Now, a RL’s agent re­ward seems closer to plea­sure than qualia—this would cor­re­spond to a Mary brought up in a pu­ri­tan­i­cal, plea­sure-hat­ing en­vi­ron­ment.

Closer to the origi­nal ex­per­i­ment, we could imag­ine the AI is pro­grammed to en­ter into cer­tain spe­cific sub­rou­tines, when pre­sented with cer­tain stim­uli. The only way for the AI to start these sub­rou­tines, is if the stim­uli is pre­sented to them. Then, upon see­ing red, the AI en­ters a com­pletely new men­tal state, with new sub­rou­tines. The AI could know ev­ery­thing about its pro­gram­ming, and about the stim­u­lus, and, in­tel­lec­tu­ally, what would change about it­self if it saw red. But un­til it did, it would not en­ter that men­tal state.

If we use ⬜ to (in­for­mally) de­note “know­ing all about”, then ⬜(X→Y) does not im­ply Y. Here X and Y could be “see­ing red” and “the men­tal ex­pe­rience of see­ing red”. I could have sim­plified that by say­ing that ⬜Y does not im­ply Y. Know­ing about a men­tal state, even perfectly, does not put you in that men­tal state.

This closely re­sem­bles the origi­nal Mary’s room ex­per­i­ment. And it seems that if any­one in­sists that cer­tain fea­tures are nec­es­sary to the in­tu­ition be­hind Mary’s room, then these fea­tures could be added to this model as well.

Mary’s room is fas­ci­nat­ing, but it doesn’t seem to be talk­ing about hu­mans ex­clu­sively, or even about con­scious en­tities.