Efficient element-wise function computation in Python

Question 1

I have the following optimization problem:

Given two np.arrays X,Y and a function K I would like to compute as fast as possible the matrix incidence gram_matrix where the (i,j)-th element is computed as K(X[i],Y[j]).

Here is an implementation using nested for-loops, which are acknowledged to be the slowest to solve these kind of problems.

def proxy_kernel(X,Y,K):
 gram_matrix = np.zeros((X.shape[0], Y.shape[0]))
 for i, x in enumerate(X):
 for j, y in enumerate(Y):
 gram_matrix[i, j] = K(x, y)
 return gram_matrix

Question 2

Do we know anything else about the function K?

Question 3

The function K is supposed to be a generic kernel function. For example one of these crsouza.com/2010/03/…

Question 4

What is X.ndim, Y.ndim? What kinds of inputs does K take? Scalar, 1d vectors, something higher?

Question 5

Some of the same kernels have been implemented in scikit-learn: scikit-learn.org/stable/modules/metrics.html

Question 6

You can avoid the nested loops using numpy.meshgrid to build a table of entries in x and y, and numpy.vectorize to apply a function to all entries in the table:

def tabulate(x, y, f):
 """Return a table of f(x, y)."""
 return np.vectorize(f)(*np.meshgrid(x, y, sparse=True))

For example:

>>> import operator
>>> tabulate(np.arange(1, 5), np.arange(1, 4), operator.mul)
array([[ 1, 2, 3, 4],
 [ 2, 4, 6, 8],
 [ 3, 6, 9, 12]])

This is about twice as fast as proxy_kernel:

>>> from timeit import timeit
>>> X = Y = np.arange(2000)
>>> timeit(lambda:proxy_kernel(X, Y, operator.mul), number=1)
2.174600816098973
>>> timeit(lambda:tabulate(X, Y, operator.mul), number=1)
0.9889162541367114

but it still has to evaluate the slow Python code in f for each entry in the table. To really speed things up you need to use whole-array operations throughout. Depending on how you've written f, it may be possible to do this directly. For example, if you have something like:

def f(x, y):
 return np.exp(-0.5 * np.abs(x - y))

then this is already suitable for applying to whole arrays:

>>> f(*np.meshgrid(np.arange(4), np.arange(4), sparse=True))
array([[ 1. , 0.60653066, 0.36787944, 0.22313016],
 [ 0.60653066, 1. , 0.60653066, 0.36787944],
 [ 0.36787944, 0.60653066, 1. , 0.60653066],
 [ 0.22313016, 0.36787944, 0.60653066, 1. ]])

and this is about five times as fast as using numpy.vectorize:

>>> timeit(lambda:f(*np.meshgrid(X, Y, sparse=True)), number=1)
0.1973482659086585

Question 7

This is similar to the answer I gave this poster (different name) on SO - stackoverflow.com/a/30088791/901925. vectorize can double the speed, but real speedup requires getting at the internals of K.

Question 8

Since this is Python, you will almost always be faster if you can make use of code implemented in C.

For instance, you might be able to convert X (Y) into a matrix where the row (column) is repeated multiple times to fit the size of your output matrix. Then you can maybe find a C-implemented function somewhere that combines matrices element-wise with a user-provided kernel, and that might save a little time for looping.

Just found in the docs this: http://docs.scipy.org/doc/numpy-1.9.1/numpy-ref-1.9.1.pdf#section*.1764 This seems to be doing the repeating already(?).

But in any case, your solution seems pretty good already. You might be able to squeeze a bit for the "looping over matrices" part, and for the cost of assignment to particular positions in the matrices (including Python function call overhead), but the kernel K, which I assume outweighs that cost by far, will still need to be called the same amount of times. :)

Gareth Rees Gareth Rees 50.1k3 gold badges130 silver badges210 bronze badges · Answer 1 · 2015-05-11 14:27:40Z

You can avoid the nested loops using numpy.meshgrid to build a table of entries in x and y, and numpy.vectorize to apply a function to all entries in the table:

def tabulate(x, y, f):
 """Return a table of f(x, y)."""
 return np.vectorize(f)(*np.meshgrid(x, y, sparse=True))

For example:

>>> import operator
>>> tabulate(np.arange(1, 5), np.arange(1, 4), operator.mul)
array([[ 1, 2, 3, 4],
 [ 2, 4, 6, 8],
 [ 3, 6, 9, 12]])

This is about twice as fast as proxy_kernel:

>>> from timeit import timeit
>>> X = Y = np.arange(2000)
>>> timeit(lambda:proxy_kernel(X, Y, operator.mul), number=1)
2.174600816098973
>>> timeit(lambda:tabulate(X, Y, operator.mul), number=1)
0.9889162541367114

but it still has to evaluate the slow Python code in f for each entry in the table. To really speed things up you need to use whole-array operations throughout. Depending on how you've written f, it may be possible to do this directly. For example, if you have something like:

def f(x, y):
 return np.exp(-0.5 * np.abs(x - y))

then this is already suitable for applying to whole arrays:

>>> f(*np.meshgrid(np.arange(4), np.arange(4), sparse=True))
array([[ 1. , 0.60653066, 0.36787944, 0.22313016],
 [ 0.60653066, 1. , 0.60653066, 0.36787944],
 [ 0.36787944, 0.60653066, 1. , 0.60653066],
 [ 0.22313016, 0.36787944, 0.60653066, 1. ]])

and this is about five times as fast as using numpy.vectorize:

>>> timeit(lambda:f(*np.meshgrid(X, Y, sparse=True)), number=1)
0.1973482659086585

This is similar to the answer I gave this poster (different name) on SO - stackoverflow.com/a/30088791/901925. vectorize can double the speed, but real speedup requires getting at the internals of K.

Günther Noack Günther Noack 1113 bronze badges · Answer 2 · 2015-05-07 07:31:54Z

Since this is Python, you will almost always be faster if you can make use of code implemented in C.

For instance, you might be able to convert X (Y) into a matrix where the row (column) is repeated multiple times to fit the size of your output matrix. Then you can maybe find a C-implemented function somewhere that combines matrices element-wise with a user-provided kernel, and that might save a little time for looping.

Just found in the docs this: http://docs.scipy.org/doc/numpy-1.9.1/numpy-ref-1.9.1.pdf#section*.1764 This seems to be doing the repeating already(?).

But in any case, your solution seems pretty good already. You might be able to squeeze a bit for the "looping over matrices" part, and for the cost of assignment to particular positions in the matrices (including Python function call overhead), but the kernel K, which I assume outweighs that cost by far, will still need to be called the same amount of times. :)

Stack Exchange Network

Efficient element-wise function computation in Python

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Efficient element-wise function computation in Python

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions