Message 256873 - Python tracker

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

In-reply-to
Author	johnwalker
Recipients	johnwalker
Date	2015年12月22日.22:18:24
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1450822705.24.0.946819992184.issue25928@psf.upfronthosting.co.za>

Content
In statistics, there is a FIXME on Line 250 above _decimal_to_ratio that says: # FIXME This is faster than Fraction.from_decimal, but still too slow. Half of the time is spent in a conversion in d.as_tuple(). Decimal internally stores the digits as a string, but in d.as_tuple(), the digits are individually cast to integers and returned as a tuple of integers. This is OK, but _decimal_to_ratio undoes the work that was done in d.as_tuple() by adding them all back into an integer. A similar, but slightly different approach is taken in Fractions.from_decimal, where the tuple is cast into a string and then parsed into an integer. We can be a lot faster if we use the _int instance variable directly. In the case of _decimal_to_ratio, the new code seems to be twice as fast with usage _decimal_to_ratio(Decimal(str(random.random()))): def _decimal_to_ratio(d): sign, exp = d._sign, d._exp if exp in ('F', 'n', 'N'): # INF, NAN, sNAN assert not d.is_finite() return (d, None) num = int(d._int) if exp < 0: den = 10*-exp else: num = 10*exp den = 1 if sign: num = -num return (num, den) If the performance improvement is considered worthwhile, here are a few solutions I see. 1) Use _int directly in fractions and statistics. 2) Add a digits_as_str method to Decimal. This prevents exposing _int as an implementation detail, and makes sense to me since I suspect there is a lot of code casting the tuple of int to a string anyway. 3) Add a parameter to as_tuple for determining whether digits should be returned as a string or a tuple. 4) Deprecate _int in Decimal and add a new reference str_digits. There are probably more solutions. I lean towards 4, because it makes usage easier and avoids cluttering Decimal with methods. Here is what I used for benchmarks: ======== import timeit old_setup = """ import random from decimal import Decimal def _decimal_to_ratio(d): sign, digits, exp = d.as_tuple() if exp in ('F', 'n', 'N'): # INF, NAN, sNAN assert not d.is_finite() return (d, None) num = 0 for digit in digits: num = num10 + digit if exp < 0: den = 10*-exp else: num = 10exp den = 1 if sign: num = -num return (num, den) def run_it(): dec = Decimal(str(random.random())) _decimal_to_ratio(dec) """ new_setup = """ import random from decimal import Decimal def _decimal_to_ratio(d): sign, exp = d._sign, d._exp if exp in ('F', 'n', 'N'): # INF, NAN, sNAN assert not d.is_finite() return (d, None) num = int(d._int) if exp < 0: den = 10-exp else: num = 10*exp den = 1 if sign: num = -num return (num, den) def run_it(): dec = Decimal(str(random.random())) _decimal_to_ratio(dec) """ if __name__ == '__main__': print("Testing proposed implementation") print("number = 10000") print(timeit.Timer(stmt='run_it()', setup=new_setup).timeit(number=10000)) print("number = 100000") print(timeit.Timer(stmt='run_it()', setup=new_setup).timeit(number=100000)) print("number = 1000000") print(timeit.Timer(stmt='run_it()', setup=new_setup).timeit(number=1000000)) print("Testing old implementation") print("number = 10000") print(timeit.Timer(stmt='run_it()', setup=old_setup).timeit(number=10000)) print("number = 100000") print(timeit.Timer(stmt='run_it()', setup=old_setup).timeit(number=100000)) print("number = 1000000") print(timeit.Timer(stmt='run_it()', setup=old_setup).timeit(number=1000000))

Content

In statistics, there is a FIXME on Line 250 above _decimal_to_ratio that says:
# FIXME This is faster than Fraction.from_decimal, but still too slow.
Half of the time is spent in a conversion in d.as_tuple(). Decimal internally stores the digits as a string, but in d.as_tuple(), the digits are individually cast to integers and returned as a tuple of integers.
This is OK, but _decimal_to_ratio undoes the work that was done in d.as_tuple() by adding them all back into an integer. A similar, but slightly different approach is taken in Fractions.from_decimal, where the tuple is cast into a string and then parsed into an integer. We can be a lot faster if we use the _int instance variable directly.
In the case of _decimal_to_ratio, the new code seems to be twice as fast with usage _decimal_to_ratio(Decimal(str(random.random()))):
def _decimal_to_ratio(d):
 sign, exp = d._sign, d._exp
 if exp in ('F', 'n', 'N'): # INF, NAN, sNAN
 assert not d.is_finite()
 return (d, None)
 num = int(d._int)
 if exp < 0:
 den = 10**-exp
 else:
 num *= 10**exp
 den = 1
 if sign:
 num = -num
 return (num, den)
If the performance improvement is considered worthwhile, here are a few solutions I see.
1) Use _int directly in fractions and statistics.
2) Add a digits_as_str method to Decimal. This prevents exposing _int as an implementation detail, and makes sense to me since I suspect there is a lot of code casting the tuple of int to a string anyway. 
3) Add a parameter to as_tuple for determining whether digits should be returned as a string or a tuple.
4) Deprecate _int in Decimal and add a new reference str_digits.
There are probably more solutions. I lean towards 4, because it makes usage easier and avoids cluttering Decimal with methods. 
Here is what I used for benchmarks:
========
import timeit
old_setup = """
import random
from decimal import Decimal
def _decimal_to_ratio(d):
 sign, digits, exp = d.as_tuple()
 if exp in ('F', 'n', 'N'): # INF, NAN, sNAN
 assert not d.is_finite()
 return (d, None)
 num = 0
 for digit in digits:
 num = num*10 + digit
 if exp < 0:
 den = 10**-exp
 else:
 num *= 10**exp
 den = 1
 if sign:
 num = -num
 return (num, den)
def run_it():
 dec = Decimal(str(random.random()))
 _decimal_to_ratio(dec)
"""
new_setup = """
import random
from decimal import Decimal
def _decimal_to_ratio(d):
 sign, exp = d._sign, d._exp
 if exp in ('F', 'n', 'N'): # INF, NAN, sNAN
 assert not d.is_finite()
 return (d, None)
 num = int(d._int)
 if exp < 0:
 den = 10**-exp
 else:
 num *= 10**exp
 den = 1
 if sign:
 num = -num
 return (num, den)
def run_it():
 dec = Decimal(str(random.random()))
 _decimal_to_ratio(dec)
"""
if __name__ == '__main__':
 print("Testing proposed implementation")
 print("number = 10000")
 print(timeit.Timer(stmt='run_it()', setup=new_setup).timeit(number=10000))
 print("number = 100000") 
 print(timeit.Timer(stmt='run_it()', setup=new_setup).timeit(number=100000))
 print("number = 1000000") 
 print(timeit.Timer(stmt='run_it()', setup=new_setup).timeit(number=1000000))
 print("Testing old implementation")
 print("number = 10000")
 print(timeit.Timer(stmt='run_it()', setup=old_setup).timeit(number=10000))
 print("number = 100000") 
 print(timeit.Timer(stmt='run_it()', setup=old_setup).timeit(number=100000))
 print("number = 1000000") 
 print(timeit.Timer(stmt='run_it()', setup=old_setup).timeit(number=1000000))

History
Date	User	Action	Args
2015年12月22日 22:18:25	johnwalker	set	recipients: + johnwalker
2015年12月22日 22:18:25	johnwalker	set	messageid: <1450822705.24.0.946819992184.issue25928@psf.upfronthosting.co.za>
2015年12月22日 22:18:25	johnwalker	link	issue25928 messages
2015年12月22日 22:18:24	johnwalker	create

homepage