Firstly, given the narrow support of the prior (the result of processing the brain imaging data is expected to look like the original pictures), it seems less than breathtaking that the posterior—the reconstructed movie—is also narrow: it looks sort of like the original. But how could it not? (A point that pedanterrific touches on here.) But I don’t feel able to do more than raise the question. Perhaps someone more knowledgeable in the methods they use could comment.
I’m reminded of an possibly apocryphal story from the early days of speech recognition. As a demo, the researchers set up a computer chess game in which you would speak your move, and the computer would play it on screen. So a general (it was a military project) comes to see the demo, they invite him to speak a chess move, he coughs, and the machine responds with P-K4. The prior was too narrow—noises that weren’t moves at all were outside its support.
Secondly, it’s already known that the retina projects to a part of the visual cortex (V1) that spatially corresponds very closely to the retina. The part imaged includes V1 and the rest of the early visual areas (see end of first page of the paper), so it’s not very surprising that something can be recovered that looks to the human viewer somewhat similar to the original image. But only to the human viewer. The same human that can recognise an elephant in a movie is recognising a elephantish blob in the reconstruction. That does not mean that elephant-recognition is happening in the brain tissue that is being imaged. Are we merely seeing the image before it has undergone any substantial neural processing, and recovering what is left of the original, rather than what the brain is making of it? It’s cool that they’ve got that far, but this isn’t reading visual experience out of the brain (and isn’t claimed to be).
Another anecdote to amplify the point. In the early days of machine vision—say, 60s and early 70s—people produced what they called “edge-detection” algorithms. The result of processing an image would be another image consisting of—apparently—all of the edges in the original image. But it was only to the human eye that that is what it looked like. As far as the software was concerned, it was just another pixel array. The software did not know that there were any “edges” there, and if you tried to really detect “edges” by stipulating that any connected set of black pixels in the transformed image was an “edge”, you just got a mess. The software had made the edges more salient to the human eye, but it had not detected edges in any useful sense. All it did was apply a sharpening transformation that would nowadays be an ordinary Photoshop filter.
but this isn’t reading visual experience out of the brain (and isn’t claimed to be).
I neglected to notice the title of the paper, “Reconstructing visual experiences from brain activity...” So they do claim to be reconstructing visual experiences from brain activity. What they are actually doing is reconstructing the pictures that the subjects were looking at.
Cool, yes, but I have two questions about it.
Firstly, given the narrow support of the prior (the result of processing the brain imaging data is expected to look like the original pictures), it seems less than breathtaking that the posterior—the reconstructed movie—is also narrow: it looks sort of like the original. But how could it not? (A point that pedanterrific touches on here.) But I don’t feel able to do more than raise the question. Perhaps someone more knowledgeable in the methods they use could comment.
I’m reminded of an possibly apocryphal story from the early days of speech recognition. As a demo, the researchers set up a computer chess game in which you would speak your move, and the computer would play it on screen. So a general (it was a military project) comes to see the demo, they invite him to speak a chess move, he coughs, and the machine responds with P-K4. The prior was too narrow—noises that weren’t moves at all were outside its support.
Secondly, it’s already known that the retina projects to a part of the visual cortex (V1) that spatially corresponds very closely to the retina. The part imaged includes V1 and the rest of the early visual areas (see end of first page of the paper), so it’s not very surprising that something can be recovered that looks to the human viewer somewhat similar to the original image. But only to the human viewer. The same human that can recognise an elephant in a movie is recognising a elephantish blob in the reconstruction. That does not mean that elephant-recognition is happening in the brain tissue that is being imaged. Are we merely seeing the image before it has undergone any substantial neural processing, and recovering what is left of the original, rather than what the brain is making of it? It’s cool that they’ve got that far, but this isn’t reading visual experience out of the brain (and isn’t claimed to be).
Another anecdote to amplify the point. In the early days of machine vision—say, 60s and early 70s—people produced what they called “edge-detection” algorithms. The result of processing an image would be another image consisting of—apparently—all of the edges in the original image. But it was only to the human eye that that is what it looked like. As far as the software was concerned, it was just another pixel array. The software did not know that there were any “edges” there, and if you tried to really detect “edges” by stipulating that any connected set of black pixels in the transformed image was an “edge”, you just got a mess. The software had made the edges more salient to the human eye, but it had not detected edges in any useful sense. All it did was apply a sharpening transformation that would nowadays be an ordinary Photoshop filter.
I wrote:
I neglected to notice the title of the paper, “Reconstructing visual experiences from brain activity...” So they do claim to be reconstructing visual experiences from brain activity. What they are actually doing is reconstructing the pictures that the subjects were looking at.