[Python-Dev] Backport new float repr to Python 2.7?

Sun Oct 11 20:28:11 CEST 2009

In a recent #python-dev IRC conversation, it was suggested that we
should consider backporting the new-style float repr from py3k to
trunk. I'd like to get people's opinions on this idea.
To recap quickly, the algorithm for computing the repr of floats changed
between Python 2.x and Python 3.x (well, actually between 3.0 and 3.1,
but 3.0 is dead):
 - in Python 2.x, repr(x) computes 17 significant decimal digits, and
 then strips trailing zeros. In other words, it's pretty much identical
 to doing '%.17g' % x. The computation is done using the platform's
 *printf functions.
 - in Python 3.x, repr(x) returns the shortest decimal string that's
 guaranteed to evaluate back to the float x under correct rounding.
 The computation is done using David Gay's dtoa.c code, adapted
 for inclusion in Python (in file Python/dtoa.c).
There are (in my view) many benefits to the new approach. Among
them:
 - fewer newbie complaints and questions (on c.l.p, IRC, Stack
 Overflow, etc.) about Python 'rounding incorrectly'. Whether this is a
 good thing or not is the matter of some debate (I'm tempted to
 borrow the time machine and simply say 'see the replies
 to this message'!)
 - string to float *and* float to string conversions are both guaranteed
 correctly rounded in 3.x: David Gay's code implements the conversion
 in both directions, and having correctly rounded string -> float
 conversions is essential to ensure that eval(repr(x)) recovers x exactly.
 - the repr of round(x, n) really does have at most n digits after the
 point, giving the semi-illusion that x really has been rounded exactly,
 and eliminating one of the most common user complaints about the
 round function.
 - round(x, n) agrees exactly with '{:.{}f}'.format(x, n) (this isn't
 true in Python 2.x, and the difference is a cause of bug reports)
 - side effects like finding that float(x) rounds correctly for
 Decimal instances x.
 - the output from the new rule is more consistent: the 'strip trailing
 zeros' part of the old rule has some strange consequences: e.g.,
 in 2.x right now (on a typical machine):
 >>> 0.02
 0.02
 >>> 0.03
 0.029999999999999999
 even though neither 0.02 nor 0.03 can be exactly represented
 in binary. 3.x gives '0.02' and '0.03'.
 - repr(x) is consistent across platforms (or at least across platforms
 with IEEE 754 doubles; in practice this seems to account for
 virtually all platforms currently running Python).
 - the float <-> string conversions are under our control, so any bugs
 found can be fixed in the Python source. There's no shortage of
 conversion bugs in the wild, and certainly bugs have been observed in
 OS X, Linux and Windows. (The ones I found in OS X 10.5 have
 been fixed in OS X 10.6, though.)
Possible problems:
 - breaking docstrings in third party code. Though Eric reminded me
 that when we implemented this for 3.1, there were essentially no
 standard library test breakages resulting from the changed repr
 format.
 - some might argue that the new repr (and round) just allows users
 to remain ignorant of floating-point difficulties for longer, and that
 this is a bad thing. I don't really buy either of these points.
 - someone has to put in the work. As mentioned below, I'm happy
 to do this (and Eric's offered to help, without which this probably
 wouldn't be feasible at all), but it'll use cycles that I could also
 usefully be spending elsewhere.
I'm mostly neutral on the backport idea: I'm very happy that this is
in 3.x, but don't see any great need to backport it. But if there's
majority (+BDFL) support, I'm willing to put the work in to do the
backport.
Masochists who are still reading by this point and who want more
information about the new repr implementation can see the issue
discussion:
http://bugs.python.org/issue1580
Thoughts?
Mark