I wrote a task using pandas and I'm wondering if the code can be optimized. The code pretty much goes as:
All the dataframes are 919 * 919. socio
is a dataframe with many fields. matrices
is a dictionary that holds different dataframes objects.
I apply a series of formulas and store the results in the result
dataframe.
import numpy as np
frame = [zone for zone in range(taz_start_id, taz_end_id +1)]
#frame = list from 1 to 919
result = DataFrame(0.0, index=frame, columns=frame)
#Get a seires type and apply the results to all the rows of the dataframe by index
temp = np.log(socio["ser_emp"] + 1) * 0.36310718
result = result.apply(lambda x: x + temp[x.index], axis = 0)
#Divide two columns apply a coefficient and fill all the nan to 0. apply results to result dataframe
temp = (socio["hhpop"] / socio["acres"]) * -0.07379568
temp = temp.fillna(0)
result = result.apply(lambda x: x + temp[x.index], axis = 0)
result = (matrices['avgtt'].transpose() * -0.05689183) + result
# set a 1.5 value if dist is between values or 0 if not
result =((matrices["dist"] > 1) & (matrices["dist"] <= 2.5)) * 1.5 + result
# see if each cell is 0 if not set value to exp(value)
result = result.applymap(lambda x: 0 if x == 0 else exp(x))
1 Answer 1
You don't want to use an awkward lambda
to do the addition. You can use the add
method with a parameter axis=0
that specifies which axis to match the lower dimensional argument.
I bundled the two Series
you are adding into one so you only have to broadcast across the index once.
Next, I just changed to match my style and improve my readability
Finally, the finding where not zero and filling in with exp
, I'm assuming exp
is numpy's exp
via some import from numpy import exp
or something. Otherwise, replace with np.exp
. This is much more vectorized. result
is a dataframe and mask
makes anything that evaluates to True
from the passed condition statement into np.nan
. I then chain a fillna
with a vectorized call to np.exp(result)
result = result.add(
np.log(socio.ser_emp.add(1)).mul(.36310718).add(
socio.hhpop.div(socio.acres).mul(-.07379568).fillna(0)
), axis=0)
avgtt, dist = matrices['avgtt'], matrices['dist']
result += avgtt.T * -0.05689183 + ((dist > 1) & (dist <= 2.5)) * 1.5
result = result.mask(result != 0).fillna(exp(result))
-
\$\begingroup\$ This is pretty helpful! ill go ahead and approach it the way you are. Thank you. Ill wait to see if there are other answers before accepting your answer. \$\endgroup\$Daniel– Daniel2017年01月05日 23:26:43 +00:00Commented Jan 5, 2017 at 23:26
-
\$\begingroup\$ trying
result = result.add(np.log(socio.basic_emp + 1).mul(0.24160834))
is returning the dataframe with all 0 and not with the updated value, any idea why? \$\endgroup\$Daniel– Daniel2017年01月05日 23:45:41 +00:00Commented Jan 5, 2017 at 23:45 -
\$\begingroup\$ you don't have
axis=0
\$\endgroup\$piRSquared– piRSquared2017年01月05日 23:46:34 +00:00Commented Jan 5, 2017 at 23:46 -
\$\begingroup\$ you are right thanks. this seems to be working now \$\endgroup\$Daniel– Daniel2017年01月05日 23:48:03 +00:00Commented Jan 5, 2017 at 23:48