Compound assignment operators in Python's NumPy library

Question 1

The "vectorizing" of fancy indexing by Python's NumPy library sometimes gives unexpected results.

For example:

import numpy
a = numpy.zeros((1000,4), dtype='uint32')
b = numpy.zeros((1000,4), dtype='uint32')
i = numpy.random.random_integers(0,999,1000)
j = numpy.random.random_integers(0,3,1000)
a[i,j] += 1
for k in xrange(1000):
 b[i[k],j[k]] += 1

It gives different results in the arrays 'a' and 'b' (i.e. the appearance of tuple (i,j) appears as 1 in 'a' regardless of repeats, whereas repeats are counted in 'b').

This is easily verified as follows:

numpy.sum(a)
883
numpy.sum(b)
1000

It is also notable that the fancy indexing version is almost two orders of magnitude faster than the for loop.

Is there an efficient way for NumPy to compute the repeat counts as implemented using the for loop in the provided example?

Question 2

I'm not quite sure I understand the question. Wouldn't that always be the length of i and j? Is there a case where it isn't?

Question 3

Nope. In the fancy indexing case, a[i,j] += 1, any repeated (i,j) combinations will only increment a[i,j] once. It is as if the right-hand side is evaluated all at once, a[:] + 1, where all values in 'a' are 0, and the result is saved back at once setting some values in 'a' to 1. In the for-loop case, b[i[k],j[k]] += 1, repeated (i,j) pairs each increment their count by 1.

Question 4

I see that the two indexing methods yield different results, but wouldn't the for-loop indexing always result in numpy.sum(b) == j.size? Because for each element in i, j some element of b is incremented once.

Question 5

Yes. The for-loop version of b always adds up to j.size. However, the result I need is the final counts in b. Is there a fast way to compute b?

Question 6

Ah, now I understand. You're not interested in sum(b), but in b itself. Me blockhead today.

Question 7

Ben,

The following modification to your code is a more accurate reflection of my larger problem:


 from timeit import timeit
 import numpy
 # Dimensions
 ROWS = 10000000
 SAMPLES = 10000
 COLS = 4
 # Number of times to run timeit
 TEST_COUNT = 100 
 # Matrices
 a = numpy.zeros((ROWS, COLS), dtype='uint32')
 b = numpy.zeros((ROWS, COLS), dtype='uint32')
 c = numpy.zeros((ROWS, COLS), dtype='uint32')
 # Indices
 i = numpy.random.random_integers(0, ROWS - 1, SAMPLES)
 j = numpy.random.random_integers(0, COLS - 1, SAMPLES)
 # Fancy
 def fancy():
 a[i,j] += 1
 print "Fancy:", timeit("fancy()", setup="from __main__ import fancy", number=TEST_COUNT)
 # Loop
 def loop():
 for k in xrange(SAMPLES):
 b[i[k],j[k]] += 1
 print "Loop:", timeit("loop()", setup="from __main__ import loop", number=TEST_COUNT)
 # Flat
 def flat():
 flatIndices = i*COLS + j
 sums = numpy.bincount(flatIndices)
 c.flat[:len(sums)] += sums
 print "Flat:", timeit("flat()", setup="from __main__ import flat", number=TEST_COUNT)
 # Check that for_loop and custom are equivalent
 print "Equivalent:", (c == b).all()

With the exception that there are probably few samples that are actually repeated in this version. Let's ignore that difference for now. On my machine, your code gave the following timings:


 Fancy: 0.109203100204
 Loop: 7.37937998772
 Flat: 126.571173906
 Equivalent: True

This slowdown is largely attributable to the large sparse array output by bincount().

Question 8

See the second edit in my post. I'm off to bed.

Question 9

This is what I came up with (see the flat function at the bottom):

from timeit import timeit
import numpy
# Dimensions
ROWS = 1000
COLS = 4
# Number of times to run timeit
TEST_COUNT = 100 
# Matrices
a = numpy.zeros((ROWS, COLS), dtype='uint32')
b = numpy.zeros((ROWS, COLS), dtype='uint32')
c = numpy.zeros((ROWS, COLS), dtype='uint32')
# Indices
i = numpy.random.random_integers(0, ROWS - 1, ROWS)
j = numpy.random.random_integers(0, COLS - 1, ROWS)
# Fancy
def fancy():
 a[i,j] += 1
print "Fancy:", timeit("fancy()", setup="from __main__ import fancy", number=TEST_COUNT)
# Loop
def loop():
 for k in xrange(1000):
 b[i[k],j[k]] += 1
print "Loop:", timeit("loop()", setup="from __main__ import loop", number=TEST_COUNT)
# Flat
def flat():
 flatIndices = i*COLS + j
 sums = numpy.bincount(flatIndices)
 c.flat[:len(sums)] += sums
print "Flat:", timeit("flat()", setup="from __main__ import flat", number=TEST_COUNT)
# Check that for_loop and custom are equivalent
print "Equivalent:", (c == b).all()

On my machine, that prints:

Fancy: 0.0117284889374
Loop: 0.937140445391
Flat: 0.0165873246295
Equivalent: True

So the flat version is about 50% slower than fancy, but much faster than loop. It does take up some memory for the flatIndices and sums, though. Note that c.flat does not create a copy of c, it only provides a linear interface to the array.

Edit: Out of curiousity, I ran another test with COLS = 1e6 with the following results:

Fancy: 24.6063425241
Loop: N/A
Flat: 22.4316268267

Memory usage according to the Windows task manager (not the most accurate measuring tool, I know) was at about 80 MB base-line, spiking up to 130 MB during flat().

Edit Nr 2: Another attempt with your benchmark:

def flat():
 flatIndices = i*COLS + j
 uniqueIndices = numpy.unique(flatIndices)
 bins = numpy.append(uniqueIndices, uniqueIndices.max() + 1)
 histo = numpy.histogram(flatIndices, bins)
 c.flat[uniqueIndices] += histo[0]

This gives me

Fancy: 0.143004997089
Loop: 9.48711325557
Flat: 0.474580977267
Equivalent: True

This looks much better. Unfortunately, I couldn't find a way to avoid the bins array since numpy.histogram treats the last bin as a closed interval. So if you only pass uniqueIndices as bins, histogram will put both the last and second to last index in the same bin, which is usually wrong.

user1451766 user1451766 112 bronze badges · Answer 1 · 2012-06-12 21:02:34Z

Ben,

The following modification to your code is a more accurate reflection of my larger problem:


 from timeit import timeit
 import numpy
 # Dimensions
 ROWS = 10000000
 SAMPLES = 10000
 COLS = 4
 # Number of times to run timeit
 TEST_COUNT = 100 
 # Matrices
 a = numpy.zeros((ROWS, COLS), dtype='uint32')
 b = numpy.zeros((ROWS, COLS), dtype='uint32')
 c = numpy.zeros((ROWS, COLS), dtype='uint32')
 # Indices
 i = numpy.random.random_integers(0, ROWS - 1, SAMPLES)
 j = numpy.random.random_integers(0, COLS - 1, SAMPLES)
 # Fancy
 def fancy():
 a[i,j] += 1
 print "Fancy:", timeit("fancy()", setup="from __main__ import fancy", number=TEST_COUNT)
 # Loop
 def loop():
 for k in xrange(SAMPLES):
 b[i[k],j[k]] += 1
 print "Loop:", timeit("loop()", setup="from __main__ import loop", number=TEST_COUNT)
 # Flat
 def flat():
 flatIndices = i*COLS + j
 sums = numpy.bincount(flatIndices)
 c.flat[:len(sums)] += sums
 print "Flat:", timeit("flat()", setup="from __main__ import flat", number=TEST_COUNT)
 # Check that for_loop and custom are equivalent
 print "Equivalent:", (c == b).all()

With the exception that there are probably few samples that are actually repeated in this version. Let's ignore that difference for now. On my machine, your code gave the following timings:


 Fancy: 0.109203100204
 Loop: 7.37937998772
 Flat: 126.571173906
 Equivalent: True

This slowdown is largely attributable to the large sparse array output by bincount().

\$\begingroup\$ See the second edit in my post. I'm off to bed. \$\endgroup\$

Benjamin Kloster
– Benjamin Kloster

2012年06月12日 21:50:10 +00:00
Commented Jun 12, 2012 at 21:50

Benjamin Kloster Benjamin Kloster 5762 silver badges7 bronze badges · Answer 2 · 2012-06-12 19:24:39Z

This is what I came up with (see the flat function at the bottom):

from timeit import timeit
import numpy
# Dimensions
ROWS = 1000
COLS = 4
# Number of times to run timeit
TEST_COUNT = 100 
# Matrices
a = numpy.zeros((ROWS, COLS), dtype='uint32')
b = numpy.zeros((ROWS, COLS), dtype='uint32')
c = numpy.zeros((ROWS, COLS), dtype='uint32')
# Indices
i = numpy.random.random_integers(0, ROWS - 1, ROWS)
j = numpy.random.random_integers(0, COLS - 1, ROWS)
# Fancy
def fancy():
 a[i,j] += 1
print "Fancy:", timeit("fancy()", setup="from __main__ import fancy", number=TEST_COUNT)
# Loop
def loop():
 for k in xrange(1000):
 b[i[k],j[k]] += 1
print "Loop:", timeit("loop()", setup="from __main__ import loop", number=TEST_COUNT)
# Flat
def flat():
 flatIndices = i*COLS + j
 sums = numpy.bincount(flatIndices)
 c.flat[:len(sums)] += sums
print "Flat:", timeit("flat()", setup="from __main__ import flat", number=TEST_COUNT)
# Check that for_loop and custom are equivalent
print "Equivalent:", (c == b).all()

On my machine, that prints:

Fancy: 0.0117284889374
Loop: 0.937140445391
Flat: 0.0165873246295
Equivalent: True

So the flat version is about 50% slower than fancy, but much faster than loop. It does take up some memory for the flatIndices and sums, though. Note that c.flat does not create a copy of c, it only provides a linear interface to the array.

Edit: Out of curiousity, I ran another test with COLS = 1e6 with the following results:

Fancy: 24.6063425241
Loop: N/A
Flat: 22.4316268267

Memory usage according to the Windows task manager (not the most accurate measuring tool, I know) was at about 80 MB base-line, spiking up to 130 MB during flat().

Edit Nr 2: Another attempt with your benchmark:

def flat():
 flatIndices = i*COLS + j
 uniqueIndices = numpy.unique(flatIndices)
 bins = numpy.append(uniqueIndices, uniqueIndices.max() + 1)
 histo = numpy.histogram(flatIndices, bins)
 c.flat[uniqueIndices] += histo[0]

This gives me

Fancy: 0.143004997089
Loop: 9.48711325557
Flat: 0.474580977267
Equivalent: True

This looks much better. Unfortunately, I couldn't find a way to avoid the bins array since numpy.histogram treats the last bin as a closed interval. So if you only pass uniqueIndices as bins, histogram will put both the last and second to last index in the same bin, which is usually wrong.

Stack Exchange Network

Compound assignment operators in Python's NumPy library

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Compound assignment operators in Python's NumPy library

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions