Python / Numpy running 15x slower than MATLAB - am I using Numpy effeciently?

Question 1

Here is some code I wrote in Python / Numpy that I pretty much directly translated from MATLAB code. When I run the code in MATLAB on my machine, it takes roughly 17 seconds. When I run the code in Python / Numpy on my machine, it takes roughly 233 seconds. Am I not using Numpy effectively? Please look over my Python code to see if I'm using Numpy in a non effective manner. All this code is doing is fitting the parameter D (diffusion coefficient) in the heat equation to some synthetically generated data using MCMC method.

import numpy as np
from numpy import * 
import pylab as py
from pylab import *
import math
import time
def heat(D,u0,q,tdim):
 xdim = np.size(u0)
 Z = np.zeros([xdim,tdim])
 Z[:,0]=u0;
 for i in range(1,tdim):
 for j in range (1,xdim-1):
 Z[j,i]=Z[j,i-1]+ D*q*(Z[j-1,i-1]-2*Z[j,i-1]+Z[j+1,i-1])
 return Z
start_time = time.clock()
L = 10
D = 0.5
s = 0.03 # magnitude of noise
Tmax = 0.2
xdim = 25
tdim = 75 
x = np.linspace(0,L,xdim)
t = np.linspace(0,Tmax,tdim)
dt = t[1]-t[0]
dx = x[1]-x[0]
q = dt/(dx**2)
r1 = 0.75*L
r2 = 0.8*L
################################################
## check the stability criterion dt/(dx^2)<.5 ##
################################################
# Define the actual initial temperature distribution
u0 = np.zeros(xdim)
for i in range(0,xdim):
 if(x[i]>=r1 and x[i]<=r2):
 u0[i] = 1
xDat = range(1,xdim-1)
tDat = np.array([tdim])
nxDat = len(xDat)
ntDat = 1
tfinal = tdim-1
# synthesize data
Z = heat(D,u0,q,tdim)
u = Z[xDat,tfinal] # matrix
uDat = u + s*randn(nxDat)
# MATLAB PLOTTING
#figure(1);surf(x,t,Z); hold on;
#if ntDat>1, mesh(x(xDat),t(tDat),uDat);
#else set(plot3(x(xDat),t(tDat)*ones(1,nxDat),uDat,'r-o'),'LineWidth',3);
#end; hold off; drawnow
#MCMC run
N = 10000
m = 100
XD = 1.0
X = np.zeros(N)
X[0] = XD
Z = heat(XD,u0,q,tdim)
u = Z[xDat,tfinal]
oLLkd = sum(sum(-(u-uDat)**2))/(2*s**2)
LL = np.zeros(N)
LL[0] = oLLkd
# random walk step size
w = 0.1
for n in range (1,N):
 XDp = XD+w*(2*rand(1)-1)
 if XDp > 0:
 Z = heat(XDp,u0,q,tdim)
 u = Z[xDat,tfinal]
 nLLkd = sum(sum( -(u-uDat)**2))/(2*s**2)
 alpha = exp((nLLkd-oLLkd))
 if random() < alpha: 
 XD = XDp
 oLLkd = nLLkd
 CZ = Z
 X[n] = XD;
 LL[n] = oLLkd;
print time.clock() - start_time, "seconds"

Question 2

Not sure how this works in python, but could you run a kind of profile on the code to see which lines use most computation time?

Question 3

I translated this back to Matlab, and ran it on Octave. With that double loop it was very slow, slower than numpy. My guess is that Matlab (probably a newer version) is compiling the loops. The kind of vectorization that classic Matlab required is no longer essential to fast code. Numpy and Octave still require thinking in terms of vector and matrix operations.

Question 4

In the heat function, simply vectorizing the inner loop, drops the time from 340 sec to 56 sec, a 6x improvement. It starts by defining the first column of Z, and calculates the next column from that (modeling heat diffusion).

def heat(D,u0,q,tdim):
 xdim = np.size(u0)
 Z = np.zeros([xdim,tdim])
 Z[:,0]=u0;
 for i in range(1,tdim):
 #for j in range (1,xdim-1):
 # Z[j,i]=Z[j,i-1]+ D*q*(Z[j-1,i-1]-2*Z[j,i-1]+Z[j+1,i-1])
 J = np.arange(1, xdim-1)
 Z[J,i] = Z[J,i-1] + D*q*( Z[J-1,i-1] - 2*Z[J,i-1] + Z[J+1,i-1] )
 return Z

some added improvement (10x speedup) by streamlining the indexing

 Z1 = Z[:,i-1]
 Z[j,i] = Z1[1:-1] + D*q* (Z1[:-2] - 2 * Z1[1:-1] + Z1[2:])

Better yet. This drops time to 7sec, a 45x improvement. It constructs a matrix with 3 diagonals, and applies that repeatedly to the u vector (with a dot product).

def heat(D,u0,q,tdim):
 # drops time to 7sec
 N = np.size(u0)
 dq = D*q
 A = np.eye(N,N,0)+ dq*(np.eye(N,N,-1)+np.eye(N,N,1)-2*np.eye(N,N,0))
 Z = np.zeros([N,tdim])
 Z[:,0] = u0;
 # print u0.shape, A.shape, (A*u0).shape, np.dot(A,u0).shape
 for i in range(1,tdim):
 u0 = np.dot(A,u0)
 Z[:,i] = u0
 return Z

Based on further testing and reading, I think np.dot(A,u0) is using the fast BLAS code.

For larger dimensions (here xdim is only 25), scipy.sparse can be used to make a more compact A matrix. For example, a sparse version of A can be produced with

sp.eye(N,N,0) + D * q * sp.diags([1, -2, 1], [-1, 0, 1], shape=(N, N))

But there isn't a speed advantage at this small size.

Question 5

The first obvious thing that jumped out was using nested "for" loops to work with array elements. Replace with whole-array arithmetic, and use offset slicing to shift the array (minus one end or the other)

The first and last elements will require special attention, taking a few minutes of development time, but miniscule execution time.

I suspect this alone with will improve your execution speed greatly.

hpaulj hpaulj 1,5611 gold badge9 silver badges16 bronze badges · Answer 1 · 2013-08-22 20:13:19Z

In the heat function, simply vectorizing the inner loop, drops the time from 340 sec to 56 sec, a 6x improvement. It starts by defining the first column of Z, and calculates the next column from that (modeling heat diffusion).

def heat(D,u0,q,tdim):
 xdim = np.size(u0)
 Z = np.zeros([xdim,tdim])
 Z[:,0]=u0;
 for i in range(1,tdim):
 #for j in range (1,xdim-1):
 # Z[j,i]=Z[j,i-1]+ D*q*(Z[j-1,i-1]-2*Z[j,i-1]+Z[j+1,i-1])
 J = np.arange(1, xdim-1)
 Z[J,i] = Z[J,i-1] + D*q*( Z[J-1,i-1] - 2*Z[J,i-1] + Z[J+1,i-1] )
 return Z

some added improvement (10x speedup) by streamlining the indexing

 Z1 = Z[:,i-1]
 Z[j,i] = Z1[1:-1] + D*q* (Z1[:-2] - 2 * Z1[1:-1] + Z1[2:])

Better yet. This drops time to 7sec, a 45x improvement. It constructs a matrix with 3 diagonals, and applies that repeatedly to the u vector (with a dot product).

def heat(D,u0,q,tdim):
 # drops time to 7sec
 N = np.size(u0)
 dq = D*q
 A = np.eye(N,N,0)+ dq*(np.eye(N,N,-1)+np.eye(N,N,1)-2*np.eye(N,N,0))
 Z = np.zeros([N,tdim])
 Z[:,0] = u0;
 # print u0.shape, A.shape, (A*u0).shape, np.dot(A,u0).shape
 for i in range(1,tdim):
 u0 = np.dot(A,u0)
 Z[:,i] = u0
 return Z

Based on further testing and reading, I think np.dot(A,u0) is using the fast BLAS code.

For larger dimensions (here xdim is only 25), scipy.sparse can be used to make a more compact A matrix. For example, a sparse version of A can be produced with

sp.eye(N,N,0) + D * q * sp.diags([1, -2, 1], [-1, 0, 1], shape=(N, N))

But there isn't a speed advantage at this small size.

DarenW DarenW 1313 bronze badges · Answer 2 · 2012-10-19 04:49:58Z

The first obvious thing that jumped out was using nested "for" loops to work with array elements. Replace with whole-array arithmetic, and use offset slicing to shift the array (minus one end or the other)

The first and last elements will require special attention, taking a few minutes of development time, but miniscule execution time.

I suspect this alone with will improve your execution speed greatly.

Stack Exchange Network

Python / Numpy running 15x slower than MATLAB - am I using Numpy effeciently?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Python / Numpy running 15x slower than MATLAB - am I using Numpy effeciently?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions