I have the following optimization problem:
Given two np.arrays
X
,Y
and a functionK
I would like to compute as fast as possible the matrix incidence gram_matrix where the(i,j)-th
element is computed asK(X[i],Y[j])
.
Here is an implementation using nested for-loops, which are acknowledged to be the slowest to solve these kind of problems.
def proxy_kernel(X,Y,K):
gram_matrix = np.zeros((X.shape[0], Y.shape[0]))
for i, x in enumerate(X):
for j, y in enumerate(Y):
gram_matrix[i, j] = K(x, y)
return gram_matrix
2 Answers 2
You can avoid the nested loops using numpy.meshgrid
to build a table of entries in x
and y
, and numpy.vectorize
to apply a function to all entries in the table:
def tabulate(x, y, f):
"""Return a table of f(x, y)."""
return np.vectorize(f)(*np.meshgrid(x, y, sparse=True))
For example:
>>> import operator
>>> tabulate(np.arange(1, 5), np.arange(1, 4), operator.mul)
array([[ 1, 2, 3, 4],
[ 2, 4, 6, 8],
[ 3, 6, 9, 12]])
This is about twice as fast as proxy_kernel
:
>>> from timeit import timeit
>>> X = Y = np.arange(2000)
>>> timeit(lambda:proxy_kernel(X, Y, operator.mul), number=1)
2.174600816098973
>>> timeit(lambda:tabulate(X, Y, operator.mul), number=1)
0.9889162541367114
but it still has to evaluate the slow Python code in f
for each entry in the table. To really speed things up you need to use whole-array operations throughout. Depending on how you've written f
, it may be possible to do this directly. For example, if you have something like:
def f(x, y):
return np.exp(-0.5 * np.abs(x - y))
then this is already suitable for applying to whole arrays:
>>> f(*np.meshgrid(np.arange(4), np.arange(4), sparse=True))
array([[ 1. , 0.60653066, 0.36787944, 0.22313016],
[ 0.60653066, 1. , 0.60653066, 0.36787944],
[ 0.36787944, 0.60653066, 1. , 0.60653066],
[ 0.22313016, 0.36787944, 0.60653066, 1. ]])
and this is about five times as fast as using numpy.vectorize
:
>>> timeit(lambda:f(*np.meshgrid(X, Y, sparse=True)), number=1)
0.1973482659086585
-
\$\begingroup\$ This is similar to the answer I gave this poster (different name) on SO - stackoverflow.com/a/30088791/901925.
vectorize
can double the speed, but real speedup requires getting at the internals ofK
. \$\endgroup\$hpaulj– hpaulj2015年05月11日 19:29:49 +00:00Commented May 11, 2015 at 19:29
Since this is Python, you will almost always be faster if you can make use of code implemented in C.
For instance, you might be able to convert X (Y) into a matrix where the row (column) is repeated multiple times to fit the size of your output matrix. Then you can maybe find a C-implemented function somewhere that combines matrices element-wise with a user-provided kernel, and that might save a little time for looping.
Just found in the docs this: http://docs.scipy.org/doc/numpy-1.9.1/numpy-ref-1.9.1.pdf#section*.1764 This seems to be doing the repeating already(?).
But in any case, your solution seems pretty good already. You might be able to squeeze a bit for the "looping over matrices" part, and for the cost of assignment to particular positions in the matrices (including Python function call overhead), but the kernel K, which I assume outweighs that cost by far, will still need to be called the same amount of times. :)
K
? \$\endgroup\$X.ndim
,Y.ndim
? What kinds of inputs doesK
take? Scalar, 1d vectors, something higher? \$\endgroup\$scikit-learn
: scikit-learn.org/stable/modules/metrics.html \$\endgroup\$