[Python-Dev] Re: Memory size overflows

2002年10月25日 13:07:14 -0400

I wrote:
> Shall I submit a patch?

Tim Peters wrote:
> Sure, but also submit your timing harness so that people can measure the
> effects cross-platform and cross-compiler.

Of course. I first wanted to see if there was any interest.
Integrated into Python, the speedup was much less: only 0.1%
overall for this million-multiply script:
 import time
 start = time.time()
 a = -1000
 while a <= 1000:
 b = 1000
 while b <= 1000:
 b += 1
 a += 1
 end = time.time()
 base = end - start
 start = time.time()
 a = -1000
 while a <= 1000:
 b = -1000
 while b <= 1000:
 b += 1
 c = a * b
 a += 1
 end = time.time()
 print end - start - base
The difference in speedup is an interesting data point on
Python overhead in general (it took just over 6 seconds to
do 1,000,000 multiplies on my system).
> Which platform? Which compiler? What was your test driver? Was this
> timing the mult code in isolation, or timing Python-level multiplies?
> Claims of small speedups are notoriously platform- and test-dependent. If
> it's a mixed bag across platforms, the risk of introducing a new bug would
> favor leaving things alone. In the absence of a clear correctness proof, a
> Python simulation program demonstrating correctness exhaustively in small
> bases would also be helpful.

I doubt you're still interested, but I used Cygwin on a
P3/833 Thinkpad running Win2K (GCC compiler). My original
results were on just the multiply code in isolation--I
wanted to see if there was interest before continuing.
Since I have them, I attached my initial testbed (new.c
is the final version), and here is an exhaustive test
for the "cannot_overflow" check used in that version:
 from __future__ import generators
 def cannot_overflow(a,b,m):
 M_MASK = (1 << m) - 1
 HI_LO_BIT = 1 << ((m/2) - 1)
 OVL_IMPOSSIBLE_MASK = HI_LO_BIT - 1
 OVL_POSSIBLE_MASK = M_MASK ^ OVL_IMPOSSIBLE_MASK
 if a < 0:
 check = OVL_POSSIBLE_MASK
 else:
 check = 0
 if (a & OVL_POSSIBLE_MASK) == check:
 if b < 0:
 check = OVL_POSSIBLE_MASK
 else:
 check = 0
 if (b & OVL_POSSIBLE_MASK) == check:
 return True
 return False
 def overflowed(a,b,m):
 VALID_MASK = (2**(m - 1)) - 1
 INVALID_MASK = (2**(2*m) - 1) ^ VALID_MASK
 prod = a * b
 highbits = prod & INVALID_MASK
 return (highbits != 0) and (highbits != INVALID_MASK)
 def checkall(m):
 for i in range(-(2**(m-1)),2**(m-1)):
 for j in range(-(2**(m-1)),2**(m-1)):
 if cannot_overflow(i,j,m) and overflowed(i,j,m):
 yield (i,j)
 if __name__ == "__main__":
 for m in range(2,11,2):
 print "M ==", m
 for i in checkall(m):
 print i
 print "END OF LIST"
You can compare the "cannot_overflow" code to the code that
would be added to int_mul. For the sake of minimizing impact,
I limited the changes to defining UL_OVERFLOW_POSSIBLE_MASK
as the equivalent of ~(2**(m/2-1) - 1) and this code, added
immediately after converting the arguments into longs:
 {
 unsigned long ma = a & UL_OVERFLOW_POSSIBLE_MASK;
 if (ma == ((a < 0) ? UL_OVERFLOW_POSSIBLE_MASK : 0))
 {
 unsigned long mb = b & UL_OVERFLOW_POSSIBLE_MASK;
 if (mb == ((b < 0) ? UL_OVERFLOW_POSSIBLE_MASK : 0))
 {
 return PyInt_FromLong(a*b);
 }
 }
 }
This seems pretty safe, but the payout seems too small
to warrant taking this any further. Unless someone asks
me to do so, I'm planning to drop it.
-Jerry