First of all, wow, great read! The non-technical explanations in the first half made it easy to map those concepts to the technical notation in the second half.
The hardest part to understand for me was the idea of absolute bits of optimization. I resolved my confusion after some closer examination, but I’d suggest two changes that would strengthen the explanation:
1. The animated graph showing increasing bits of optimization has a very unfortunate zoom level. This — combined with the fat left side of the graph — gave me a false impression that the area was simply reducing by a third of its total as the optimization bits counted up to a maximum of ~3 (2.93 to be exact). Eventually realized there’s presumably a long tail extending to the right past the range we can see, but that wasn’t clear from the start. Even knowing this, it’s still hard to get my brain to parse the area as being progressively cut in half. I would hesitate to change the shape of this graph (it’s important to understand that it’s the area being halved, not just horizontal progress) but I think zooming out the domain would make it feel much more like “approaching infinite optimization”.
2. This is a minor point, but I think the “in terms of” in this sentence is needlessly vague:
Then, the absolute optimization of a specific state x is in terms of the probability mass above it, that is, to the right of it on the x-axis.
It took a bit for me to understand, “oh, bits of optimization is just calculated as the inverse of p(x)”. Maybe it would be clearer to say that from the start? Not sure what the exact math terms are, but something like:
Then, the absolute optimization of a state x is [the proportional inverse of] the probability mass above it, that is, to the right of it on the x-axis.
Thanks, that’s useful feedback! I fiddled with that graph animation a bit, and yeah, I think it’s off. I think it’s not actually calculating the area correctly, so that the first “bit” is actually highlighting less than half the area (even counting a long tail). It seemed finicky to debug and I wasn’t sure it mattered, but hearing that it confused somebody makes me more inclined to redo it.
First of all, wow, great read! The non-technical explanations in the first half made it easy to map those concepts to the technical notation in the second half.
The hardest part to understand for me was the idea of absolute bits of optimization. I resolved my confusion after some closer examination, but I’d suggest two changes that would strengthen the explanation:
1. The animated graph showing increasing bits of optimization has a very unfortunate zoom level. This — combined with the fat left side of the graph — gave me a false impression that the area was simply reducing by a third of its total as the optimization bits counted up to a maximum of ~3 (2.93 to be exact). Eventually realized there’s presumably a long tail extending to the right past the range we can see, but that wasn’t clear from the start. Even knowing this, it’s still hard to get my brain to parse the area as being progressively cut in half. I would hesitate to change the shape of this graph (it’s important to understand that it’s the area being halved, not just horizontal progress) but I think zooming out the domain would make it feel much more like “approaching infinite optimization”.
2. This is a minor point, but I think the “in terms of” in this sentence is needlessly vague:
It took a bit for me to understand, “oh, bits of optimization is just calculated as the inverse of p(x)”. Maybe it would be clearer to say that from the start? Not sure what the exact math terms are, but something like:
Thanks, that’s useful feedback! I fiddled with that graph animation a bit, and yeah, I think it’s off. I think it’s not actually calculating the area correctly, so that the first “bit” is actually highlighting less than half the area (even counting a long tail). It seemed finicky to debug and I wasn’t sure it mattered, but hearing that it confused somebody makes me more inclined to redo it.