Glyph Lefkowitz: The Numbers, They Lie
(published: Oct. 7, 2019, 6:25 a.m.)
It’s October, and we’re all getting ready for Halloween, so allow me to me tell you a horror story, in Python:
>>> 0.1 + 0.2 - 0.3 5.551115123125783e-17
Some of you might already be familiar with this chilling tale, but for those who might not have experienced it directly, let me briefly recap.
In Python, the default representation of a number with a decimal point in it is something called an “IEEE 754 double precision binary floating-point number”. This standard achieves a generally useful trade-off between performance, correctness, and is widely implemented in hardware, making it a popular choice for numbers in many programming language.
However, as our spooky story above indicates, it’s not perfect. 0.1 + 0.2 is very slightly less than 0.3 in this representation, because it is a floating-point representation in base 2.
If you’ve worked professionally with software that manipulates money1, you typically learn this lesson early; it’s quite easy to smash head-first into the problem with binary floating-point the first time you have an item that costs 30 cents and for some reason three dimes doesn’t suffice to cover it.
There are a few different approaches to the problem; one is using integers for everything, and denominating your transactions in cents rather than dollars. A strategy which requires less weird unit-conversion2, is to use the built-in
decimal module, which provides a floating-point base 10 representation, rather than the standard base-2, which doesn’t have any of these weird glitches surrounding numbers like 0.1.
This is often where a working programmer’s numerical education ends; don’t use floats, they’re bad, use decimals, they’re good. Indeed, this advice will work well up to a pretty high degree of application complexity. But the story doesn’t end there. Once division gets involved, things can still get weird really fast:
1 2 3
>>> from decimal import Decimal >>> (Decimal("1") / 7) * 14 Decimal('2.000000000000000000000000001')
The problem is the same: before, we were working with 1/10, a value that doesn’t have a finite (non-repeating) representation in base 2; now we’re working with 1/7, which has the same problem in base 10.
Any time you have a representation of a number which uses digits and a decimal point, no matter the base, you’re going to run in to some rational values which do not have an exact representation with a finite number of digits; thus, you’ll drop some digits off the (necessarily finite) end, and end up with a slightly inaccurate representation.
But Python does have a way to maintain symbolic accuracy for arbitrary rational numbers -- the
1 2 3 4 5
>>> from fractions import Fraction >>> Fraction(1)/3 + Fraction(2)/3 == 1 True >>> (Fraction(1)/7) * 14 == 2 True
You can multiply and divide and add and subtract to your heart’s content, and still compare against zero and it’ll always work exactly, giving you the right answers.
So if Python has a “correct” representation, which doesn’t screw up our results under a basic arithmetic operation such as division, why isn’t it the default? We don’t care all that much about performance, right? Python certainly trades off correctness and safety in plenty of other areas.
First of all, while Python’s willing to trade off some storage or CPU efficiency for correctness, precise fractions rapidly consume huge amounts of storage even under very basic algorithms, like consuming gigabytes while just trying to maintain a simple running average over a stream of incoming numbers.
But even more importantly, you’ll notice that I said we could maintain symbolic accuracy for arbitrary rational numbers; but, as it turns out, a whole lot of interesting math you might want to do with a computer involves numbers which are irrational: like π. If you want to use a computer to do it, pretty much all trigonometry3 involves a slightly inaccurate approximation unless you have a literally infinite amount of storage.
As Morpheus put it, “welcome to the desert of the ℝ”.