I appreciate this write-up! I felt like the answer to “is the truth of a statement represented somewhere” is obviously no (elaboration below). However I was surprised by the “tell a story about purpose / natural fact angle”. It seems that a malfunctioning heart is “wrong” in a different, more obvious, sense than e.g. a false mathematical statement.
Here are the points that I thought were obvious (and this still makes sense to me):
If you measure correlation between real things (horse at night) and NN activations (looks like a horse at night) there will be a mismatch.
This feels like a kind of “skill issue” to me: your class of things (which included “horse at night” but not “looks like a horse at night”) wasn’t sufficiently wide or good. We should just recognise that say the examples correlated with this direction are all consistent with the NN thinking something is a horse, and then labelling it correctly. Obviously this is hard (impossible? especially of the NN is smarter then us), but the issue doesn’t seem that the NN made a mistake per se. Instead the issue was choosing only real physical things as possible labels.
It’s hard to define “misrepresentation” as a physical phenomenon based on the physical atoms present.
This seems somewhat obvious to me, of course there is no representation of whether someone is “right” or “wrong” in the atoms! That’d be cheating! Imagine writing “The Riemann hypothesis is true” or “The runtime of busy beaver (99) is an even number”. The compute by the universe when you write this is less than it takes to decide whether (at least the second) hypothesis is correct or not!
However, at the end of the post you suggest there might actually be a way to get to a definition of misrepresentation or malfunction based on physical atoms only: The (correct / intended) representation is that which makes a more natural (less surprising?) story. The heart having the purpose of pumping blood explains its design a lot better than it having the purpose of making a thump-thump sound, even when all you have is a malfunctioning heart.
I appreciate this write-up! I felt like the answer to “is the truth of a statement represented somewhere” is obviously no (elaboration below). However I was surprised by the “tell a story about purpose / natural fact angle”. It seems that a malfunctioning heart is “wrong” in a different, more obvious, sense than e.g. a false mathematical statement.
Here are the points that I thought were obvious (and this still makes sense to me):
If you measure correlation between real things (horse at night) and NN activations (looks like a horse at night) there will be a mismatch.
This feels like a kind of “skill issue” to me: your class of things (which included “horse at night” but not “looks like a horse at night”) wasn’t sufficiently wide or good. We should just recognise that say the examples correlated with this direction are all consistent with the NN thinking something is a horse, and then labelling it correctly. Obviously this is hard (impossible? especially of the NN is smarter then us), but the issue doesn’t seem that the NN made a mistake per se. Instead the issue was choosing only real physical things as possible labels.
It’s hard to define “misrepresentation” as a physical phenomenon based on the physical atoms present.
This seems somewhat obvious to me, of course there is no representation of whether someone is “right” or “wrong” in the atoms! That’d be cheating! Imagine writing “The Riemann hypothesis is true” or “The runtime of busy beaver (99) is an even number”. The compute by the universe when you write this is less than it takes to decide whether (at least the second) hypothesis is correct or not!
However, at the end of the post you suggest there might actually be a way to get to a definition of misrepresentation or malfunction based on physical atoms only:
The (correct / intended) representation is that which makes a more natural (less surprising?) story. The heart having the purpose of pumping blood explains its design a lot better than it having the purpose of making a thump-thump sound, even when all you have is a malfunctioning heart.