Checking a number’s precision correctly is quite trivial, and there were one-line fixes I could have applied that would make the function work properly on all numbers, not just some of them.
I’m really curious about what such fixes look like. In my experience, those edge cases tend to come about when there is some set of mutually incompatible desired properties of a system, the the mutual incompatibility isn’t obvious. For example
We want to use standard IEEE754 floating point numbers to store our data
If two numbers are not equal to each other, they should not have the same string representation.
The sum of two numbers should have a precision no higher than the operand with the highest precision. For example, adding 0.1 + 0.2 should yield 0.3, not 0.30000000000000004.
It turns out those are mutually incompatible requirements!
You could say “we should drop requirement 1 and use a fixed point or fraction datatype” but that’s emphatically not a one line change, and has its own places where you’ll run into mutually incompatible requirements.
Or you could add a “duct tape” solution like “use printf("%.2f", result) in the case where we actually ran into this problem, in which we know both operands have a 2 decimal precision, and revisit if this bug comes up again in a different context”.
The sum of two numbers should have a precision no higher than the operand with the highest precision. For example, adding 0.1 + 0.2 should yield 0.3, not 0.30000000000000004.
I would argue that the precision should be capped at the lowest precision of the operands. In physics, if you add to lengths, 0.123m+0.123456m should be rounded to 0.246m.
Also, IEEE754 fundamentally does not contain information about the precision of a number. If you want to track that information correctly, you can use two floating point numbers and do interval arithmetic. There is even an IEEE standard for that nowadays.
Of course, this comes at a cost. While monotonic functions can be converted for interval arithmetic, the general problem of finding the extremal values of a function in some high-dimensional domain is a hard problem. Of course, if you know how the function is composed out of simpler operations, you can at least find some bounds.
Or you could do what physicists do (at least when they are taking lab courses) and track physical quantities with a value and a precision, and do uncertainty propagation. (This might not be 100% kosher in cases where you first calculate multiple intermediate quantities from the same measurement (whose error will thus not be independent) and continue to treat them as if they were. But that might just give you bigger errors.) Also, this relies on your function being sufficiently well-described in the region of interest by the partial derivatives at the central point. If you calculate the uncertainty of f(x,y)=xy for x=0.1±1, y=0.1±1 using the partial derivatives you will not have fun.
In the general case I agree it’s not necessarily trivial; e.g. if your program uses the whole range of decimal places to a meaningful degree, or performs calculations that can compound floating point errors up to higher decimal places. (Though I’d argue that in both of those cases pure floating point is probably not the best system to use.) In my case I knew that the intended precision of the input would never be precise enough to overlap with floating point errors, so I could just round anything past the 15th decimal place down to 0.
Hmm, interesting. The exact choice of decimal place at which to cut off the comparison is certainly arbitrary, and that doesn’t feel very elegant. My thinking is that within the constraint of using floating point numbers, there fundamentally isn’t a perfect solution. Floating point notation changes some numbers into other numbers, so there are always going to be some cases where number comparisons are wrong. What we want to do is define a problem domain and check if floating point will cause problems within that domain; if it doesn’t, go for it, if it does, maybe don’t use floating point.
In this case my fix solves the problem for what I think is the vast majority of the most likely inputs (in particular it solves it for all the inputs that my particular program was going to get), and while it’s less fundamental than e.g. using arbitrary-precision arithmetic, it does better on the cost-benefit analysis. (Just like how “completely overhaul our company” addresses things on a more fundamental level than just fixing the structural simulation, but may not be the best fix given resource constraints.)
The main purpose of my example was not to argue that my particular approach was the “correct” one, but rather to point out the flaws in the “multiply by an arbitrary constant” approach. I’ll edit that line, since I think you’re right that it’s a little more complicated than I was making it out to be, and “trivial” could be an unfair characterization.
BTW as a concrete note, you may want to sub in 15 - ceil(log10(n)) instead of just “15”, which really only matters if you’re dealing with numbers above 10 (e.g. 1000 is represented as 0x408F400000000000, while the next float 0x408F400000000001 is 1000.000000000000114, which differs in the 13th decimal place).
I’m really curious about what such fixes look like. In my experience, those edge cases tend to come about when there is some set of mutually incompatible desired properties of a system, the the mutual incompatibility isn’t obvious. For example
We want to use standard IEEE754 floating point numbers to store our data
If two numbers are not equal to each other, they should not have the same string representation.
The sum of two numbers should have a precision no higher than the operand with the highest precision. For example, adding
0.1 + 0.2
should yield0.3
, not0.30000000000000004
.It turns out those are mutually incompatible requirements!
You could say “we should drop requirement 1 and use a fixed point or fraction datatype” but that’s emphatically not a one line change, and has its own places where you’ll run into mutually incompatible requirements.
Or you could add a “duct tape” solution like “use
printf("%.2f", result)
in the case where we actually ran into this problem, in which we know both operands have a 2 decimal precision, and revisit if this bug comes up again in a different context”.I would argue that the precision should be capped at the lowest precision of the operands. In physics, if you add to lengths, 0.123m+0.123456m should be rounded to 0.246m.
Also, IEEE754 fundamentally does not contain information about the precision of a number. If you want to track that information correctly, you can use two floating point numbers and do interval arithmetic. There is even an IEEE standard for that nowadays.
Of course, this comes at a cost. While monotonic functions can be converted for interval arithmetic, the general problem of finding the extremal values of a function in some high-dimensional domain is a hard problem. Of course, if you know how the function is composed out of simpler operations, you can at least find some bounds.
Or you could do what physicists do (at least when they are taking lab courses) and track physical quantities with a value and a precision, and do uncertainty propagation. (This might not be 100% kosher in cases where you first calculate multiple intermediate quantities from the same measurement (whose error will thus not be independent) and continue to treat them as if they were. But that might just give you bigger errors.) Also, this relies on your function being sufficiently well-described in the region of interest by the partial derivatives at the central point. If you calculate the uncertainty of f(x,y)=xy for x=0.1±1, y=0.1±1 using the partial derivatives you will not have fun.
In the general case I agree it’s not necessarily trivial; e.g. if your program uses the whole range of decimal places to a meaningful degree, or performs calculations that can compound floating point errors up to higher decimal places. (Though I’d argue that in both of those cases pure floating point is probably not the best system to use.) In my case I knew that the intended precision of the input would never be precise enough to overlap with floating point errors, so I could just round anything past the 15th decimal place down to 0.
That makes sense. I think I may have misjudged your post, as I expected that you would classify that kind of approach as a “duct tape” approach.
Hmm, interesting. The exact choice of decimal place at which to cut off the comparison is certainly arbitrary, and that doesn’t feel very elegant. My thinking is that within the constraint of using floating point numbers, there fundamentally isn’t a perfect solution. Floating point notation changes some numbers into other numbers, so there are always going to be some cases where number comparisons are wrong. What we want to do is define a problem domain and check if floating point will cause problems within that domain; if it doesn’t, go for it, if it does, maybe don’t use floating point.
In this case my fix solves the problem for what I think is the vast majority of the most likely inputs (in particular it solves it for all the inputs that my particular program was going to get), and while it’s less fundamental than e.g. using arbitrary-precision arithmetic, it does better on the cost-benefit analysis. (Just like how “completely overhaul our company” addresses things on a more fundamental level than just fixing the structural simulation, but may not be the best fix given resource constraints.)
The main purpose of my example was not to argue that my particular approach was the “correct” one, but rather to point out the flaws in the “multiply by an arbitrary constant” approach. I’ll edit that line, since I think you’re right that it’s a little more complicated than I was making it out to be, and “trivial” could be an unfair characterization.
BTW as a concrete note, you may want to sub in
15 - ceil(log10(n))
instead of just “15”, which really only matters if you’re dealing with numbers above 10 (e.g. 1000 is represented as 0x408F400000000000, while the next float 0x408F400000000001 is 1000.000000000000114, which differs in the 13th decimal place).It’s duct tapes all the way down!