6
\$\begingroup\$

Here is some code I wrote in Python / Numpy that I pretty much directly translated from MATLAB code. When I run the code in MATLAB on my machine, it takes roughly 17 seconds. When I run the code in Python / Numpy on my machine, it takes roughly 233 seconds. Am I not using Numpy effectively? Please look over my Python code to see if I'm using Numpy in a non effective manner. All this code is doing is fitting the parameter D (diffusion coefficient) in the heat equation to some synthetically generated data using MCMC method.

import numpy as np
from numpy import * 
import pylab as py
from pylab import *
import math
import time
def heat(D,u0,q,tdim):
 xdim = np.size(u0)
 Z = np.zeros([xdim,tdim])
 Z[:,0]=u0;
 for i in range(1,tdim):
 for j in range (1,xdim-1):
 Z[j,i]=Z[j,i-1]+ D*q*(Z[j-1,i-1]-2*Z[j,i-1]+Z[j+1,i-1])
 return Z
start_time = time.clock()
L = 10
D = 0.5
s = 0.03 # magnitude of noise
Tmax = 0.2
xdim = 25
tdim = 75 
x = np.linspace(0,L,xdim)
t = np.linspace(0,Tmax,tdim)
dt = t[1]-t[0]
dx = x[1]-x[0]
q = dt/(dx**2)
r1 = 0.75*L
r2 = 0.8*L
################################################
## check the stability criterion dt/(dx^2)<.5 ##
################################################
# Define the actual initial temperature distribution
u0 = np.zeros(xdim)
for i in range(0,xdim):
 if(x[i]>=r1 and x[i]<=r2):
 u0[i] = 1
xDat = range(1,xdim-1)
tDat = np.array([tdim])
nxDat = len(xDat)
ntDat = 1
tfinal = tdim-1
# synthesize data
Z = heat(D,u0,q,tdim)
u = Z[xDat,tfinal] # matrix
uDat = u + s*randn(nxDat)
# MATLAB PLOTTING
#figure(1);surf(x,t,Z); hold on;
#if ntDat>1, mesh(x(xDat),t(tDat),uDat);
#else set(plot3(x(xDat),t(tDat)*ones(1,nxDat),uDat,'r-o'),'LineWidth',3);
#end; hold off; drawnow
#MCMC run
N = 10000
m = 100
XD = 1.0
X = np.zeros(N)
X[0] = XD
Z = heat(XD,u0,q,tdim)
u = Z[xDat,tfinal]
oLLkd = sum(sum(-(u-uDat)**2))/(2*s**2)
LL = np.zeros(N)
LL[0] = oLLkd
# random walk step size
w = 0.1
for n in range (1,N):
 XDp = XD+w*(2*rand(1)-1)
 if XDp > 0:
 Z = heat(XDp,u0,q,tdim)
 u = Z[xDat,tfinal]
 nLLkd = sum(sum( -(u-uDat)**2))/(2*s**2)
 alpha = exp((nLLkd-oLLkd))
 if random() < alpha: 
 XD = XDp
 oLLkd = nLLkd
 CZ = Z
 X[n] = XD;
 LL[n] = oLLkd;
print time.clock() - start_time, "seconds"
asked Oct 19, 2012 at 4:40
\$\endgroup\$
2
  • \$\begingroup\$ Not sure how this works in python, but could you run a kind of profile on the code to see which lines use most computation time? \$\endgroup\$ Commented Jan 31, 2013 at 13:52
  • 1
    \$\begingroup\$ I translated this back to Matlab, and ran it on Octave. With that double loop it was very slow, slower than numpy. My guess is that Matlab (probably a newer version) is compiling the loops. The kind of vectorization that classic Matlab required is no longer essential to fast code. Numpy and Octave still require thinking in terms of vector and matrix operations. \$\endgroup\$ Commented Aug 23, 2013 at 5:37

2 Answers 2

18
\$\begingroup\$

In the heat function, simply vectorizing the inner loop, drops the time from 340 sec to 56 sec, a 6x improvement. It starts by defining the first column of Z, and calculates the next column from that (modeling heat diffusion).

def heat(D,u0,q,tdim):
 xdim = np.size(u0)
 Z = np.zeros([xdim,tdim])
 Z[:,0]=u0;
 for i in range(1,tdim):
 #for j in range (1,xdim-1):
 # Z[j,i]=Z[j,i-1]+ D*q*(Z[j-1,i-1]-2*Z[j,i-1]+Z[j+1,i-1])
 J = np.arange(1, xdim-1)
 Z[J,i] = Z[J,i-1] + D*q*( Z[J-1,i-1] - 2*Z[J,i-1] + Z[J+1,i-1] )
 return Z

some added improvement (10x speedup) by streamlining the indexing

 Z1 = Z[:,i-1]
 Z[j,i] = Z1[1:-1] + D*q* (Z1[:-2] - 2 * Z1[1:-1] + Z1[2:])

Better yet. This drops time to 7sec, a 45x improvement. It constructs a matrix with 3 diagonals, and applies that repeatedly to the u vector (with a dot product).

def heat(D,u0,q,tdim):
 # drops time to 7sec
 N = np.size(u0)
 dq = D*q
 A = np.eye(N,N,0)+ dq*(np.eye(N,N,-1)+np.eye(N,N,1)-2*np.eye(N,N,0))
 Z = np.zeros([N,tdim])
 Z[:,0] = u0;
 # print u0.shape, A.shape, (A*u0).shape, np.dot(A,u0).shape
 for i in range(1,tdim):
 u0 = np.dot(A,u0)
 Z[:,i] = u0
 return Z

Based on further testing and reading, I think np.dot(A,u0) is using the fast BLAS code.

For larger dimensions (here xdim is only 25), scipy.sparse can be used to make a more compact A matrix. For example, a sparse version of A can be produced with

sp.eye(N,N,0) + D * q * sp.diags([1, -2, 1], [-1, 0, 1], shape=(N, N))

But there isn't a speed advantage at this small size.

answered Aug 22, 2013 at 20:13
\$\endgroup\$
0
3
\$\begingroup\$

The first obvious thing that jumped out was using nested "for" loops to work with array elements. Replace with whole-array arithmetic, and use offset slicing to shift the array (minus one end or the other)

The first and last elements will require special attention, taking a few minutes of development time, but miniscule execution time.

I suspect this alone with will improve your execution speed greatly.

answered Oct 19, 2012 at 4:49
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.