The look-up table idea can be quite misleading. In essense you need something else than a look-up table or your look-up table needs to be properly huge.
The room should pass a challenge like
U: “Hello” R: “Hello” U:”Blue” R:”Excuse me?” U:”What did I just say?” R: “Blue”
Doing this with a fixed book that you can’t write on is quite challenging. It can be done with a book that has instructions like “conversation history is: [...], give ‘blue’ to user” (it is so huge that avoiding a black-hole weight even with minimal energy-use per single instruction is hard (not that we are concerned with physics in thought-experiment land)).
However I think it is better to think that the room operator has access to a big pile of blank paper and can copy symbols given by instructions on them and rules can condition on a symbol being drawn on a piece of paper. In order to be convinced that this “expansion” no relevant capabilities have been added, the setup should be fixed but I claim that without loss of generality one can think about the filled papers to be in a an ordered pile on which only the top paper is visible and book can contain orders like “put top paper to the bottom of the pile” etc.
Then an echo request becomes a rather simple paper-dance rather than 30 000 different commands spanning the whole vocabulary for chinese for each possibility of what the content of the echo request might be.
A related claim is that if you have a specification of a turing machine, running it requires absolutely zero idea what the symbols mean. And that a turing machine that has a fixed symbol-replacement-table can have quite dynamical and variable-like behaviour. The book of the room operator is the symbol-replacement-table and not the table of inputs-to-outputs (although given such a IO table constructing a paper-shuffle-table that realises it should be consistently possible).
To the point about being text-biased. Sense organs are connected to the brain via nerves. It is possible to find a spatial surface that goes through a human where all cognitive relevant data is inside the nerve-signaling (approximately the spine, one could argue that hormonal signals should be included). A nerve can either be firing or not be firing. Thus a bit is a faitful representation of its informational content. One can do time-slices where say each 10 micro-second gets its own slice. So 100 if the firign happens early, 010 a bit later and 001 if it is the last time. Say that you have 3 nerves you can combine their time-content representations together a representation of the whole 100010010 for one nerve firing early and 2 other firing simultanoeusly after that If you do this so that it contains all the nerves and how much time that the extended present spans this is a sufficient representation of data that the brain works on (it is not missing anything, claiming that it isn’t gets to ESP territority (which adrenaline for these purposes count)).
Then one can use a scheme like taking every 5 bits and a scheme like 00001=A, 00010=B, 00011=C … to have it as a string of letters (that different nerves and times mix is interesting but inessential) (chop into chunks of 5 letters if thinking in words is preffered). And this covers data coming from the eye, the data coming from nose and the data coming from the inner ear etc. Similarly for commands going out to arms and such. Thus text has to be sufficient to detail a humans experience of the world and this is not at all dependent on what data formats senses use or what the qualia are. If we had a text to picture conversion and a picture to text conversion we might next be asking about the possiblity of smell to text conversions and even if we exhaust all known “obvious” senses. With this setup we know that any “electronic sense” gets accounted for.
While words are transmitted as phonons and are more multidimensional than this to the extent that humans can cognitively engage with them they have to appear through this nerve-representation layer.
The look-up table idea can be quite misleading. In essense you need something else than a look-up table or your look-up table needs to be properly huge.
The room should pass a challenge like
Doing this with a fixed book that you can’t write on is quite challenging. It can be done with a book that has instructions like “conversation history is: [...], give ‘blue’ to user” (it is so huge that avoiding a black-hole weight even with minimal energy-use per single instruction is hard (not that we are concerned with physics in thought-experiment land)).
However I think it is better to think that the room operator has access to a big pile of blank paper and can copy symbols given by instructions on them and rules can condition on a symbol being drawn on a piece of paper. In order to be convinced that this “expansion” no relevant capabilities have been added, the setup should be fixed but I claim that without loss of generality one can think about the filled papers to be in a an ordered pile on which only the top paper is visible and book can contain orders like “put top paper to the bottom of the pile” etc.
Then an echo request becomes a rather simple paper-dance rather than 30 000 different commands spanning the whole vocabulary for chinese for each possibility of what the content of the echo request might be.
A related claim is that if you have a specification of a turing machine, running it requires absolutely zero idea what the symbols mean. And that a turing machine that has a fixed symbol-replacement-table can have quite dynamical and variable-like behaviour. The book of the room operator is the symbol-replacement-table and not the table of inputs-to-outputs (although given such a IO table constructing a paper-shuffle-table that realises it should be consistently possible).
To the point about being text-biased. Sense organs are connected to the brain via nerves. It is possible to find a spatial surface that goes through a human where all cognitive relevant data is inside the nerve-signaling (approximately the spine, one could argue that hormonal signals should be included). A nerve can either be firing or not be firing. Thus a bit is a faitful representation of its informational content. One can do time-slices where say each 10 micro-second gets its own slice. So 100 if the firign happens early, 010 a bit later and 001 if it is the last time. Say that you have 3 nerves you can combine their time-content representations together a representation of the whole 100010010 for one nerve firing early and 2 other firing simultanoeusly after that If you do this so that it contains all the nerves and how much time that the extended present spans this is a sufficient representation of data that the brain works on (it is not missing anything, claiming that it isn’t gets to ESP territority (which adrenaline for these purposes count)).
Then one can use a scheme like taking every 5 bits and a scheme like 00001=A, 00010=B, 00011=C … to have it as a string of letters (that different nerves and times mix is interesting but inessential) (chop into chunks of 5 letters if thinking in words is preffered). And this covers data coming from the eye, the data coming from nose and the data coming from the inner ear etc. Similarly for commands going out to arms and such. Thus text has to be sufficient to detail a humans experience of the world and this is not at all dependent on what data formats senses use or what the qualia are. If we had a text to picture conversion and a picture to text conversion we might next be asking about the possiblity of smell to text conversions and even if we exhaust all known “obvious” senses. With this setup we know that any “electronic sense” gets accounted for.
While words are transmitted as phonons and are more multidimensional than this to the extent that humans can cognitively engage with them they have to appear through this nerve-representation layer.