I am new to Python. I wrote a code that works but the algorithm is obviously not efficient at all. I am stuck in the loop on iterrows and I do not see how to get rid of it.
First a little bit on the data, they look like that:
var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 var11 var12 var13 var14 var15 var16 n
1 30.3 4.1 10.2 2.5 10.8 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 6
0.9 10.4 4.1 6.3 3.3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 5
2.5 8.8 13.9 11.8 6.1 26.8 5 2.4 NaN NaN NaN NaN NaN NaN NaN NaN 8
0.5 10.7 44 12.3 5 12.8 8.9 NaN NaN NaN NaN NaN NaN NaN NaN NaN 7
1 24.2 7.5 53.9 7 11.3 47.7 20.6 39.4 4.6 52.5 65.9 NaN NaN NaN NaN 12
16.6 8.7 1.7 39.5 47.4 9 36.7 30.4 4.7 29.9 6.5 32.2 16.2 NaN NaN NaN 13
5.8 48.2 21.3 31.9 7.4 49.3 62.5 67.4 0.5 8.3 NaN NaN NaN NaN NaN NaN 10
1.2 40.9 34.7 6.2 11.5 82.9 68.5 14 4.7 8.1 49.7 38.2 43.2 NaN NaN NaN 13
4.1 20.7 8.2 5.3 16.9 12.5 57.9 11.8 49.4 19.9 20.5 15.7 4.9 38.2 32.4 31.5 16
7 30 22.1 10.3 26.8 2.2 16.6 30.4 3.8 53.3 13.1 7.1 55.4 NaN NaN NaN 13
As you can see, n is the number of non missing variables.
Here I just report 10 lines of the data but in practice there are 20 000 lines. n goes up to 24, not 16.
Now a little more on what I am trying to do:
The ultimate goal is to compute \$y=(\alpha\,ドル\$\beta\,ドル \$\theta\$).
I have three functions u, g and h from which I define f2.
I want to find the root \$x=(wc,p1,...,pn)\$ of this vector:
(\$f2(wc,\alpha\ ,\beta\ , \theta\ , p1 , var1), f2(wc,\alpha\ ,\beta\ , \theta\ , p2 , var2),..., f2(wc,\alpha\ ,\beta\ , \theta\ , pn , varn), \sum\limits_{i=1}^n p_i-1)\$ for each line in the dataset.
So for the first line of the data where n=6, I am looking for \$x_1=(wc_1,p1_1,...,p6_1)\$ which is the root of (\$f2(wc_1,\alpha\ ,\beta\ , \theta\ , p1_1 , var1_1), f2(wc_1,\alpha\ ,\beta\ , \theta\ , p2_1 , var2_1),..., f2(wc_1,\alpha\ ,\beta\ , \theta\ , p6_1 , var6_1), \sum\limits_{i=1}^6 p_{i1}-1)\$
For the second line I look for \$x=(wc_2,p1_2,...,p5_2)\,ドル etc.
When I have solved the vector for each of the 10 lines in the dataset, I keep all the p1 \$(p1_1, p1_2, ..., p1_9, p1_{10})\$ and I look for \$y=(\alpha\,ドル\$\beta\,ドル \$\theta\$) which minimizes \$-\sum\limits_{j=1}^{10} ln(p1_j)\$.
Here is my code:
# define u
def u(x,theta):
return (1-np.exp(-x*theta))/theta
# define g
def g(p,alpha):
return p**alpha
# define h
def h(p,beta):
return p**beta
#define f2
def f2(wc, alpha, beta, theta, pi, var):
return wc - g(pi,alpha)*u(var, theta) - h((1-pi),beta)*u(-1,theta)
# f is finding the roots of the vector (w, p1, ..., pn for each line of the data) and gives back sum ln(p1)
# f1 defines the vector to find the root of for each line in the data.
def f(y):
lf = 0
for index, row in data.iterrows():
end = int(row['n'])
end2 = int(row['n'] + 1)
var = [0] * int(row['n'])
var2 = [row['var1'], row['var2'], row['var3'], row['var4'], row['var5'],row['var6'],row['var7'], row['var8'], row['var9'], row['var10'], row['var11'],row['var12'], row['var13'], row['var14'], row['var15'],row['var16']]
for k in range(0,end):
var[k]=var2[k]
def f1(x, y):
tosolve = []
unity = -1
alpha = y[0]
beta = y[1]
theta = y[2]
for p in range(1, end2):
t=p-1
new_elem= f2(x[0], alpha, beta, theta, x[p], var[t])
tosolve.append(new_elem)
unity = unity + x[p]
tosolve.append(unity)
return tosolve
x0 = [0.1] * int(row['n']+1)
sol = scipy.optimize.root(f1, x0, args=(y,), tol=0.001)
print sol
solp1 = sol.x[1]
lf = lf + np.log(solp1)
print -lf
return -lf
#this is the minimisation. It gives back alpha, beta and theta.
def main():
x0 = [1,0,0]
result = scipy.optimize.minimize(f, x0, method = 'Nelder-Mead')
return result
print main()
The dataset is huge so looping over all lines takes much too long. My goal is to avoid looping as much as possible and perform operations with vectors directly.
Thank you for your help and advice.
2 Answers 2
I don't know how to avoid a loop entirely, but this approach should be faster, at the expense of some extra memory usage.
Basically, using pandas
for this is overkill. It will hurt performance for no benefit. So the first thing I do is extract the values you want into numpy arrays. n
goes into one vector (1D array), while var1
, var2
, var3
, var4
, and var5
go into a 2D array. I then pad that 2D matrix so that it is the maximum possible length of rap
. I also create an initial x0
array that is the maximum possible length of x0
.
Next, I use izip
(or just zip
in python 3.x) to iterate over each combination of n
and rap
. I then slice rap
and x0
down to the right lengths, so they don't need to be calculated each time. These are views, so they take no additional memory.
I have also simplified the f1
function. It does more or less the same thing, just with simpler code. This means there won't be a huge speedup there, although avoiding doing the indexing each time by again using izip
may have a little speedup.
For f2
there is always some overhead when using function calls, so when doing very simple mathematical operations just once or twice like you are, it is better to limit them. Further, because of how numpy
arrays work, mathematical operations work element-by-element, but do so much faster than looping. So rather than looping over each element like you do, you can just pass a numpy
array to f2
and it will work. So all I had to do there was merge the additional functions into f2
, and call f2
with the arrays as arguments rather than manually looping. It is these changes that will most likely result in your largest performance improvement.
import numpy as np
from itertools import izip
def f2(wc, alpha, beta, theta, x, rap):
return wc - (x**alpha) * (1-np.exp(-rap*theta))/theta - ((1-x)**beta) * (1-np.exp(theta))/theta
def f(y):
ns = y['n'].values.astype('int')
maxn = ns.max()
x0 = np.full(maxn+1, 0.1)
vars = y.loc[:, ('var1', 'var2', 'var3', 'var4', 'var5')].values
if maxn > 5:
vars = np.pad(vars, [(0, 0), (0, maxn-5)], 'constant', constant_values=0)
lf = 0
for n, var in izip(ns, vars):
rap = var[:n]
def f1(x, y):
alpha, beta, theta = y[:3]
tosolve = np.full_like(x, x[1:].sum()-1)
tosolve[:-1] = f2(x[0], alpha, beta, theta, x[1:], rap)
sol = scipy.optimize.root(f1, x0[:n+1], args=(y,))
print sol
lf += np.log(sol.x[1])
return lf
If n
can be a lot larger than 5 (even just once), this approach may result in excessive memory usage. Here is an alternative approach that will run slower but uses no excess memory.
It is the same, except instead of padding the 2D array to the maximum possible length at the beginning, it pads the row to only the necessary length at each stage of the loop. This minimizes the amount of memory needed, but is slower at each stage of the loop.
import numpy as np
from itertools import izip
def f2(wc, alpha, beta, theta, x, rap):
return wc - (x**alpha) * (1-np.exp(-rap*theta))/theta - ((1-x)**beta) * (1-np.exp(theta))/theta
def f(y):
ns = y['n'].values.astype('int')
maxn = ns.max()
x0 = np.full(maxn+1, 0.1)
vars = y.loc[:, ('var1', 'var2', 'var3', 'var4', 'var5')].values
lf = 0
for n, var in izip(ns, vars):
if n == 5:
rap = var
elif n > 5:
rap = np.pad(var, (0, n-5), 'constant', constant_values=0)
else:
rap = var[:n]
def f1(x, y):
alpha, beta, theta = y[:3]
tosolve = np.full_like(x, x[1:].sum()-1)
tosolve[:-1] = f2(x[0], alpha, beta, theta, x[1:], rap)
sol = scipy.optimize.root(f1, x0[:n+1], args=(y,), tol=0.001)
print sol
lf += np.log(sol.x[1])
return lf
-
\$\begingroup\$ The code works well except that I had to change ns=y['n'] by data['n'] and vars = y.loc by vars=data.loc. Thank you! I still have a problem though because it took 9 hours 42 minutes to get to the result. \$\endgroup\$Ariane– Ariane2015年04月13日 17:32:12 +00:00Commented Apr 13, 2015 at 17:32
-
\$\begingroup\$ You can use
line_profiler
to try to find out what the slow points in the code are, but I suspect it will be in thescipy.optimize.root
step. If that is the case, the parallelizing the loop withmultiprocessing.Pool.imap
would probably help \$\endgroup\$TheBlackCat– TheBlackCat2015年04月13日 18:06:27 +00:00Commented Apr 13, 2015 at 18:06
Some general ideas:
- Make sure to use descriptive names. I have basically no idea what anything in the file does or contains except
index
androw
. Andnew_list
doesn't tell anybody anything. What does it contain? What is it for? - You can get rid of
end2
, just userange(end)
andp
instead oft
. - Send the code through
pep8
and fix all the issues. - Extract
f1
to the same level asf
, and pass in everything it needs. Then declare a trivial closure function which only runsf1
so you can pass it toscipy.optimize.root
. - Don't reuse names like
y
in nested contexts. They might work as expected, but are very likely to bite you. range(0, end)
is the same asrange(end)
.- You shouldn't pass both
x[0]
andx[p]
tof2
. Instead passx
(it's passed by reference, after all) andp
.
Once you've fixed all of these it might be possible to see ways to optimise the code.
data
looks like and what you are trying to achieve? MathJax is available, if you need to write equations in the explanation. \$\endgroup\$f2
. \$\endgroup\$n>5
? And if an item is "missing", what value would it have?NaN
or some other value? \$\endgroup\$