To start, I propose a different frame to help you. Ask yourself not “How do I get intuition about information theory?” instead ask “How is information theory informing my intuitions?”
It looks to me like it’s more central than is Bayes’ Theorem, and that it provides essential context for why and how that theorem is relevant for rationality.
You’ve already noticed that this is “deep” and “widely applicable.” Another way of saying these things is “abstract,” and abstraction reflects generalizations over some domain of experience. These generalizations are the exact sort of things which form heuristics and intuitions to apply to more specific cases.
To the meat of question:
Step 1) grok the core technology (which you seem have already started)
Step 2) ask yourself the aforementioned question.
Step 3) try to apply it to as many domains as possible
Step 4) as you come into trouble with 3), restart from 1).
When you find yourself looking for more of 1) from where you are now, I recommend at least checking out Shannon’s original paper on information. I find his writing style to be much more approachable than average for what is a highly technical paper. Be careful when reading though, because his writing is very dense with each sentence carrying a lot of information ;)
I was in a similar position, but I am now at a point where I believe ADHD is negatively affecting my life in way that has overturned my desire to not take medication. It’s hard to predict the future, but if you have a cheap or free way to get a diagnosis, I would recommend doing so for your own knowledge and to maybe make getting prescriptions in the future a smidge easier. I think it’s really believable that in your current context there are no or nearly no negative repercussions to your ADHD if you have it, but it’s hard to be certain of your future contexts, and even to know what aspects of your context would have to change for your symptoms to act (sufficiently) negatively.