Changing algorithm to avoid looping with iterrows

Question 1

I am new to Python. I wrote a code that works but the algorithm is obviously not efficient at all. I am stuck in the loop on iterrows and I do not see how to get rid of it.

First a little bit on the data, they look like that:

var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 var11 var12 var13 var14 var15 var16 n
1 30.3 4.1 10.2 2.5 10.8 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 6
0.9 10.4 4.1 6.3 3.3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 5
2.5 8.8 13.9 11.8 6.1 26.8 5 2.4 NaN NaN NaN NaN NaN NaN NaN NaN 8
0.5 10.7 44 12.3 5 12.8 8.9 NaN NaN NaN NaN NaN NaN NaN NaN NaN 7
1 24.2 7.5 53.9 7 11.3 47.7 20.6 39.4 4.6 52.5 65.9 NaN NaN NaN NaN 12
16.6 8.7 1.7 39.5 47.4 9 36.7 30.4 4.7 29.9 6.5 32.2 16.2 NaN NaN NaN 13
5.8 48.2 21.3 31.9 7.4 49.3 62.5 67.4 0.5 8.3 NaN NaN NaN NaN NaN NaN 10
1.2 40.9 34.7 6.2 11.5 82.9 68.5 14 4.7 8.1 49.7 38.2 43.2 NaN NaN NaN 13
4.1 20.7 8.2 5.3 16.9 12.5 57.9 11.8 49.4 19.9 20.5 15.7 4.9 38.2 32.4 31.5 16
7 30 22.1 10.3 26.8 2.2 16.6 30.4 3.8 53.3 13.1 7.1 55.4 NaN NaN NaN 13

As you can see, n is the number of non missing variables.
Here I just report 10 lines of the data but in practice there are 20 000 lines. n goes up to 24, not 16.

Now a little more on what I am trying to do:

The ultimate goal is to compute \$y=(\alpha\,ドル\$\beta\,ドル \$\theta\$).

I have three functions u, g and h from which I define f2.

I want to find the root \$x=(wc,p1,...,pn)\$ of this vector:

(\$f2(wc,\alpha\ ,\beta\ , \theta\ , p1 , var1), f2(wc,\alpha\ ,\beta\ , \theta\ , p2 , var2),..., f2(wc,\alpha\ ,\beta\ , \theta\ , pn , varn), \sum\limits_{i=1}^n p_i-1)\$ for each line in the dataset.

So for the first line of the data where n=6, I am looking for \$x_1=(wc_1,p1_1,...,p6_1)\$ which is the root of (\$f2(wc_1,\alpha\ ,\beta\ , \theta\ , p1_1 , var1_1), f2(wc_1,\alpha\ ,\beta\ , \theta\ , p2_1 , var2_1),..., f2(wc_1,\alpha\ ,\beta\ , \theta\ , p6_1 , var6_1), \sum\limits_{i=1}^6 p_{i1}-1)\$

For the second line I look for \$x=(wc_2,p1_2,...,p5_2)\,ドル etc.

When I have solved the vector for each of the 10 lines in the dataset, I keep all the p1 \$(p1_1, p1_2, ..., p1_9, p1_{10})\$ and I look for \$y=(\alpha\,ドル\$\beta\,ドル \$\theta\$) which minimizes \$-\sum\limits_{j=1}^{10} ln(p1_j)\$.

Here is my code:

# define u 
def u(x,theta):
 return (1-np.exp(-x*theta))/theta
# define g 
def g(p,alpha):
 return p**alpha
# define h 
def h(p,beta):
 return p**beta
#define f2 
def f2(wc, alpha, beta, theta, pi, var):
 return wc - g(pi,alpha)*u(var, theta) - h((1-pi),beta)*u(-1,theta)
# f is finding the roots of the vector (w, p1, ..., pn for each line of the data) and gives back sum ln(p1)
# f1 defines the vector to find the root of for each line in the data.
def f(y):
 lf = 0
 for index, row in data.iterrows():
 end = int(row['n'])
 end2 = int(row['n'] + 1)
 var = [0] * int(row['n'])
 var2 = [row['var1'], row['var2'], row['var3'], row['var4'], row['var5'],row['var6'],row['var7'], row['var8'], row['var9'], row['var10'], row['var11'],row['var12'], row['var13'], row['var14'], row['var15'],row['var16']]
 for k in range(0,end):
 var[k]=var2[k]
 def f1(x, y):
 tosolve = []
 unity = -1
 alpha = y[0]
 beta = y[1]
 theta = y[2]
 for p in range(1, end2):
 t=p-1
 new_elem= f2(x[0], alpha, beta, theta, x[p], var[t])
 tosolve.append(new_elem)
 unity = unity + x[p]
 tosolve.append(unity) 
 return tosolve
 x0 = [0.1] * int(row['n']+1)
 sol = scipy.optimize.root(f1, x0, args=(y,), tol=0.001)
 print sol
 solp1 = sol.x[1]
 lf = lf + np.log(solp1)
 print -lf
 return -lf
#this is the minimisation. It gives back alpha, beta and theta.
def main():
 x0 = [1,0,0] 
 result = scipy.optimize.minimize(f, x0, method = 'Nelder-Mead')
 return result
print main()

The dataset is huge so looping over all lines takes much too long. My goal is to avoid looping as much as possible and perform operations with vectors directly.

Thank you for your help and advice.

Question 2

Could you show us what data looks like and what you are trying to achieve? MathJax is available, if you need to write equations in the explanation.

Question 3

Also, please include the code for f2.

Question 4

Is there any situation in which n>5? And if an item is "missing", what value would it have? NaN or some other value?

Question 5

@200_success Thank you for your reply. I edited my question to give more information on the data and what I am trying to do.

Question 6

@TheBlackCat Thank you for the answer. I am going to change my code accordingly. n is 24 in the dataset. I edited my question to provide more information.

Question 7

I don't know how to avoid a loop entirely, but this approach should be faster, at the expense of some extra memory usage.

Basically, using pandas for this is overkill. It will hurt performance for no benefit. So the first thing I do is extract the values you want into numpy arrays. n goes into one vector (1D array), while var1, var2, var3, var4, and var5 go into a 2D array. I then pad that 2D matrix so that it is the maximum possible length of rap. I also create an initial x0 array that is the maximum possible length of x0.

Next, I use izip (or just zip in python 3.x) to iterate over each combination of n and rap. I then slice rap and x0 down to the right lengths, so they don't need to be calculated each time. These are views, so they take no additional memory.

I have also simplified the f1 function. It does more or less the same thing, just with simpler code. This means there won't be a huge speedup there, although avoiding doing the indexing each time by again using izip may have a little speedup.

For f2 there is always some overhead when using function calls, so when doing very simple mathematical operations just once or twice like you are, it is better to limit them. Further, because of how numpy arrays work, mathematical operations work element-by-element, but do so much faster than looping. So rather than looping over each element like you do, you can just pass a numpy array to f2 and it will work. So all I had to do there was merge the additional functions into f2, and call f2 with the arrays as arguments rather than manually looping. It is these changes that will most likely result in your largest performance improvement.

import numpy as np
from itertools import izip
def f2(wc, alpha, beta, theta, x, rap):
 return wc - (x**alpha) * (1-np.exp(-rap*theta))/theta - ((1-x)**beta) * (1-np.exp(theta))/theta
def f(y):
 ns = y['n'].values.astype('int')
 maxn = ns.max()
 x0 = np.full(maxn+1, 0.1)
 vars = y.loc[:, ('var1', 'var2', 'var3', 'var4', 'var5')].values
 if maxn > 5:
 vars = np.pad(vars, [(0, 0), (0, maxn-5)], 'constant', constant_values=0)
 lf = 0
 for n, var in izip(ns, vars):
 rap = var[:n]
 def f1(x, y): 
 alpha, beta, theta = y[:3]
 tosolve = np.full_like(x, x[1:].sum()-1)
 tosolve[:-1] = f2(x[0], alpha, beta, theta, x[1:], rap)
 sol = scipy.optimize.root(f1, x0[:n+1], args=(y,)) 
 print sol
 lf += np.log(sol.x[1])
 return lf

If n can be a lot larger than 5 (even just once), this approach may result in excessive memory usage. Here is an alternative approach that will run slower but uses no excess memory.

It is the same, except instead of padding the 2D array to the maximum possible length at the beginning, it pads the row to only the necessary length at each stage of the loop. This minimizes the amount of memory needed, but is slower at each stage of the loop.

import numpy as np
from itertools import izip
def f2(wc, alpha, beta, theta, x, rap):
 return wc - (x**alpha) * (1-np.exp(-rap*theta))/theta - ((1-x)**beta) * (1-np.exp(theta))/theta
def f(y):
 ns = y['n'].values.astype('int')
 maxn = ns.max()
 x0 = np.full(maxn+1, 0.1)
 vars = y.loc[:, ('var1', 'var2', 'var3', 'var4', 'var5')].values
 lf = 0
 for n, var in izip(ns, vars):
 if n == 5:
 rap = var
 elif n > 5:
 rap = np.pad(var, (0, n-5), 'constant', constant_values=0)
 else:
 rap = var[:n]
 def f1(x, y): 
 alpha, beta, theta = y[:3]
 tosolve = np.full_like(x, x[1:].sum()-1)
 tosolve[:-1] = f2(x[0], alpha, beta, theta, x[1:], rap)
 sol = scipy.optimize.root(f1, x0[:n+1], args=(y,), tol=0.001) 
 print sol
 lf += np.log(sol.x[1])
 return lf

Question 8

The code works well except that I had to change ns=y['n'] by data['n'] and vars = y.loc by vars=data.loc. Thank you! I still have a problem though because it took 9 hours 42 minutes to get to the result.

Question 9

You can use line_profiler to try to find out what the slow points in the code are, but I suspect it will be in the scipy.optimize.root step. If that is the case, the parallelizing the loop with multiprocessing.Pool.imap would probably help

Question 10

Some general ideas:

Make sure to use descriptive names. I have basically no idea what anything in the file does or contains except index and row. And new_list doesn't tell anybody anything. What does it contain? What is it for?
You can get rid of end2, just use range(end) and p instead of t.
Send the code through pep8 and fix all the issues.
Extract f1 to the same level as f, and pass in everything it needs. Then declare a trivial closure function which only runs f1 so you can pass it to scipy.optimize.root.
Don't reuse names like y in nested contexts. They might work as expected, but are very likely to bite you.
range(0, end) is the same as range(end).
You shouldn't pass both x[0] and x[p] to f2. Instead pass x (it's passed by reference, after all) and p.

Once you've fixed all of these it might be possible to see ways to optimise the code.

TheBlackCat TheBlackCat 2,44412 silver badges10 bronze badges · Accepted Answer · 2015-04-09 11:02:44Z

I don't know how to avoid a loop entirely, but this approach should be faster, at the expense of some extra memory usage.

Basically, using pandas for this is overkill. It will hurt performance for no benefit. So the first thing I do is extract the values you want into numpy arrays. n goes into one vector (1D array), while var1, var2, var3, var4, and var5 go into a 2D array. I then pad that 2D matrix so that it is the maximum possible length of rap. I also create an initial x0 array that is the maximum possible length of x0.

Next, I use izip (or just zip in python 3.x) to iterate over each combination of n and rap. I then slice rap and x0 down to the right lengths, so they don't need to be calculated each time. These are views, so they take no additional memory.

I have also simplified the f1 function. It does more or less the same thing, just with simpler code. This means there won't be a huge speedup there, although avoiding doing the indexing each time by again using izip may have a little speedup.

For f2 there is always some overhead when using function calls, so when doing very simple mathematical operations just once or twice like you are, it is better to limit them. Further, because of how numpy arrays work, mathematical operations work element-by-element, but do so much faster than looping. So rather than looping over each element like you do, you can just pass a numpy array to f2 and it will work. So all I had to do there was merge the additional functions into f2, and call f2 with the arrays as arguments rather than manually looping. It is these changes that will most likely result in your largest performance improvement.

import numpy as np
from itertools import izip
def f2(wc, alpha, beta, theta, x, rap):
 return wc - (x**alpha) * (1-np.exp(-rap*theta))/theta - ((1-x)**beta) * (1-np.exp(theta))/theta
def f(y):
 ns = y['n'].values.astype('int')
 maxn = ns.max()
 x0 = np.full(maxn+1, 0.1)
 vars = y.loc[:, ('var1', 'var2', 'var3', 'var4', 'var5')].values
 if maxn > 5:
 vars = np.pad(vars, [(0, 0), (0, maxn-5)], 'constant', constant_values=0)
 lf = 0
 for n, var in izip(ns, vars):
 rap = var[:n]
 def f1(x, y): 
 alpha, beta, theta = y[:3]
 tosolve = np.full_like(x, x[1:].sum()-1)
 tosolve[:-1] = f2(x[0], alpha, beta, theta, x[1:], rap)
 sol = scipy.optimize.root(f1, x0[:n+1], args=(y,)) 
 print sol
 lf += np.log(sol.x[1])
 return lf

If n can be a lot larger than 5 (even just once), this approach may result in excessive memory usage. Here is an alternative approach that will run slower but uses no excess memory.

It is the same, except instead of padding the 2D array to the maximum possible length at the beginning, it pads the row to only the necessary length at each stage of the loop. This minimizes the amount of memory needed, but is slower at each stage of the loop.

import numpy as np
from itertools import izip
def f2(wc, alpha, beta, theta, x, rap):
 return wc - (x**alpha) * (1-np.exp(-rap*theta))/theta - ((1-x)**beta) * (1-np.exp(theta))/theta
def f(y):
 ns = y['n'].values.astype('int')
 maxn = ns.max()
 x0 = np.full(maxn+1, 0.1)
 vars = y.loc[:, ('var1', 'var2', 'var3', 'var4', 'var5')].values
 lf = 0
 for n, var in izip(ns, vars):
 if n == 5:
 rap = var
 elif n > 5:
 rap = np.pad(var, (0, n-5), 'constant', constant_values=0)
 else:
 rap = var[:n]
 def f1(x, y): 
 alpha, beta, theta = y[:3]
 tosolve = np.full_like(x, x[1:].sum()-1)
 tosolve[:-1] = f2(x[0], alpha, beta, theta, x[1:], rap)
 sol = scipy.optimize.root(f1, x0[:n+1], args=(y,), tol=0.001) 
 print sol
 lf += np.log(sol.x[1])
 return lf

The code works well except that I had to change ns=y['n'] by data['n'] and vars = y.loc by vars=data.loc. Thank you! I still have a problem though because it took 9 hours 42 minutes to get to the result.
You can use line_profiler to try to find out what the slow points in the code are, but I suspect it will be in the scipy.optimize.root step. If that is the case, the parallelizing the loop with multiprocessing.Pool.imap would probably help

Stack Exchange Network

Changing algorithm to avoid looping with iterrows

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Changing algorithm to avoid looping with iterrows

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions