The "vectorizing" of fancy indexing by Python's NumPy library sometimes gives unexpected results.
For example:
import numpy
a = numpy.zeros((1000,4), dtype='uint32')
b = numpy.zeros((1000,4), dtype='uint32')
i = numpy.random.random_integers(0,999,1000)
j = numpy.random.random_integers(0,3,1000)
a[i,j] += 1
for k in xrange(1000):
b[i[k],j[k]] += 1
It gives different results in the arrays 'a
' and 'b
' (i.e. the appearance of tuple (i,j) appears as 1 in 'a
' regardless of repeats, whereas repeats are counted in 'b
').
This is easily verified as follows:
numpy.sum(a)
883
numpy.sum(b)
1000
It is also notable that the fancy indexing version is almost two orders of magnitude faster than the for
loop.
Is there an efficient way for NumPy to compute the repeat counts as implemented using the for loop in the provided example?
2 Answers 2
Ben,
The following modification to your code is a more accurate reflection of my larger problem:
from timeit import timeit
import numpy
# Dimensions
ROWS = 10000000
SAMPLES = 10000
COLS = 4
# Number of times to run timeit
TEST_COUNT = 100
# Matrices
a = numpy.zeros((ROWS, COLS), dtype='uint32')
b = numpy.zeros((ROWS, COLS), dtype='uint32')
c = numpy.zeros((ROWS, COLS), dtype='uint32')
# Indices
i = numpy.random.random_integers(0, ROWS - 1, SAMPLES)
j = numpy.random.random_integers(0, COLS - 1, SAMPLES)
# Fancy
def fancy():
a[i,j] += 1
print "Fancy:", timeit("fancy()", setup="from __main__ import fancy", number=TEST_COUNT)
# Loop
def loop():
for k in xrange(SAMPLES):
b[i[k],j[k]] += 1
print "Loop:", timeit("loop()", setup="from __main__ import loop", number=TEST_COUNT)
# Flat
def flat():
flatIndices = i*COLS + j
sums = numpy.bincount(flatIndices)
c.flat[:len(sums)] += sums
print "Flat:", timeit("flat()", setup="from __main__ import flat", number=TEST_COUNT)
# Check that for_loop and custom are equivalent
print "Equivalent:", (c == b).all()
With the exception that there are probably few samples that are actually repeated in this version. Let's ignore that difference for now. On my machine, your code gave the following timings:
Fancy: 0.109203100204
Loop: 7.37937998772
Flat: 126.571173906
Equivalent: True
This slowdown is largely attributable to the large sparse array output by bincount().
-
\$\begingroup\$ See the second edit in my post. I'm off to bed. \$\endgroup\$Benjamin Kloster– Benjamin Kloster2012年06月12日 21:50:10 +00:00Commented Jun 12, 2012 at 21:50
This is what I came up with (see the flat
function at the bottom):
from timeit import timeit
import numpy
# Dimensions
ROWS = 1000
COLS = 4
# Number of times to run timeit
TEST_COUNT = 100
# Matrices
a = numpy.zeros((ROWS, COLS), dtype='uint32')
b = numpy.zeros((ROWS, COLS), dtype='uint32')
c = numpy.zeros((ROWS, COLS), dtype='uint32')
# Indices
i = numpy.random.random_integers(0, ROWS - 1, ROWS)
j = numpy.random.random_integers(0, COLS - 1, ROWS)
# Fancy
def fancy():
a[i,j] += 1
print "Fancy:", timeit("fancy()", setup="from __main__ import fancy", number=TEST_COUNT)
# Loop
def loop():
for k in xrange(1000):
b[i[k],j[k]] += 1
print "Loop:", timeit("loop()", setup="from __main__ import loop", number=TEST_COUNT)
# Flat
def flat():
flatIndices = i*COLS + j
sums = numpy.bincount(flatIndices)
c.flat[:len(sums)] += sums
print "Flat:", timeit("flat()", setup="from __main__ import flat", number=TEST_COUNT)
# Check that for_loop and custom are equivalent
print "Equivalent:", (c == b).all()
On my machine, that prints:
Fancy: 0.0117284889374
Loop: 0.937140445391
Flat: 0.0165873246295
Equivalent: True
So the flat
version is about 50% slower than fancy
, but much faster than loop
. It does take up some memory for the flatIndices
and sums
, though. Note that c.flat
does not create a copy of c
, it only provides a linear interface to the array.
Edit: Out of curiousity, I ran another test with COLS = 1e6
with the following results:
Fancy: 24.6063425241
Loop: N/A
Flat: 22.4316268267
Memory usage according to the Windows task manager (not the most accurate measuring tool, I know) was at about 80 MB base-line, spiking up to 130 MB during flat()
.
Edit Nr 2: Another attempt with your benchmark:
def flat():
flatIndices = i*COLS + j
uniqueIndices = numpy.unique(flatIndices)
bins = numpy.append(uniqueIndices, uniqueIndices.max() + 1)
histo = numpy.histogram(flatIndices, bins)
c.flat[uniqueIndices] += histo[0]
This gives me
Fancy: 0.143004997089
Loop: 9.48711325557
Flat: 0.474580977267
Equivalent: True
This looks much better. Unfortunately, I couldn't find a way to avoid the bins
array since numpy.histogram
treats the last bin as a closed interval. So if you only pass uniqueIndices
as bins, histogram will put both the last and second to last index in the same bin, which is usually wrong.
numpy.sum(b) == j.size
? Because for each element ini, j
some element ofb
is incremented once. \$\endgroup\$b
always adds up toj.size
. However, the result I need is the final counts inb
. Is there a fast way to computeb
? \$\endgroup\$sum(b)
, but inb
itself. Me blockhead today. \$\endgroup\$