Thanks. You are right. I need to add an assumption, “given everything else the same”. We need to exclude surprise changes caused by variations in correctness/plausibility. Correctness/plausibility can be relatively easily distinguished through other methods. And I just added some content to the article, including: “At least right now, one thing I can speculate is: Shannon information seems to represent the upper bound on ‘importance.’”
About the “Aslan” example: the high “information” it carrays is reasonable, since the explanation following is quite “informational” for those who are interested. After the “upper bound” addtion, this example becomes reasonable, right? :)
About the “James Pennebaker research” example, I will understand it in this way: of the seemingly least important words, each one carrays a little bit information about the status and mental health of the author. The researcher collect those bits of information and make use of it. As long as “information/surprisal” is precise enough(LLMs are good at it, and fast growing), these applications would be all reasonable from information-theoretic view.
Thanks. You are right. I need to add an assumption, “given everything else the same”. We need to exclude surprise changes caused by variations in correctness/plausibility. Correctness/plausibility can be relatively easily distinguished through other methods. And I just added some content to the article, including: “At least right now, one thing I can speculate is: Shannon information seems to represent the upper bound on ‘importance.’”
About the “Aslan” example: the high “information” it carrays is reasonable, since the explanation following is quite “informational” for those who are interested. After the “upper bound” addtion, this example becomes reasonable, right? :)
About the “James Pennebaker research” example, I will understand it in this way: of the seemingly least important words, each one carrays a little bit information about the status and mental health of the author. The researcher collect those bits of information and make use of it. As long as “information/surprisal” is precise enough(LLMs are good at it, and fast growing), these applications would be all reasonable from information-theoretic view.
Hope it makes sense to you :)