4
\$\begingroup\$

I wrote a task using pandas and I'm wondering if the code can be optimized. The code pretty much goes as:

All the dataframes are 919 * 919. socio is a dataframe with many fields. matrices is a dictionary that holds different dataframes objects.

I apply a series of formulas and store the results in the result dataframe.

import numpy as np
frame = [zone for zone in range(taz_start_id, taz_end_id +1)] 
#frame = list from 1 to 919
result = DataFrame(0.0, index=frame, columns=frame)
#Get a seires type and apply the results to all the rows of the dataframe by index
temp = np.log(socio["ser_emp"] + 1) * 0.36310718
result = result.apply(lambda x: x + temp[x.index], axis = 0)
#Divide two columns apply a coefficient and fill all the nan to 0. apply results to result dataframe 
temp = (socio["hhpop"] / socio["acres"]) * -0.07379568
temp = temp.fillna(0)
result = result.apply(lambda x: x + temp[x.index], axis = 0) 
result = (matrices['avgtt'].transpose() * -0.05689183) + result 
# set a 1.5 value if dist is between values or 0 if not
result =((matrices["dist"] > 1) & (matrices["dist"] <= 2.5)) * 1.5 + result
# see if each cell is 0 if not set value to exp(value)
result = result.applymap(lambda x: 0 if x == 0 else exp(x))
asked Jan 5, 2017 at 20:47
\$\endgroup\$
0

1 Answer 1

4
\$\begingroup\$

You don't want to use an awkward lambda to do the addition. You can use the add method with a parameter axis=0 that specifies which axis to match the lower dimensional argument.

I bundled the two Series you are adding into one so you only have to broadcast across the index once.

Next, I just changed to match my style and improve my readability

Finally, the finding where not zero and filling in with exp, I'm assuming exp is numpy's exp via some import from numpy import exp or something. Otherwise, replace with np.exp. This is much more vectorized. result is a dataframe and mask makes anything that evaluates to True from the passed condition statement into np.nan. I then chain a fillna with a vectorized call to np.exp(result)

result = result.add(
 np.log(socio.ser_emp.add(1)).mul(.36310718).add(
 socio.hhpop.div(socio.acres).mul(-.07379568).fillna(0)
 ), axis=0)
avgtt, dist = matrices['avgtt'], matrices['dist']
result += avgtt.T * -0.05689183 + ((dist > 1) & (dist <= 2.5)) * 1.5
result = result.mask(result != 0).fillna(exp(result))
answered Jan 5, 2017 at 23:05
\$\endgroup\$
4
  • \$\begingroup\$ This is pretty helpful! ill go ahead and approach it the way you are. Thank you. Ill wait to see if there are other answers before accepting your answer. \$\endgroup\$ Commented Jan 5, 2017 at 23:26
  • \$\begingroup\$ trying result = result.add(np.log(socio.basic_emp + 1).mul(0.24160834)) is returning the dataframe with all 0 and not with the updated value, any idea why? \$\endgroup\$ Commented Jan 5, 2017 at 23:45
  • \$\begingroup\$ you don't have axis=0 \$\endgroup\$ Commented Jan 5, 2017 at 23:46
  • \$\begingroup\$ you are right thanks. this seems to be working now \$\endgroup\$ Commented Jan 5, 2017 at 23:48

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.