Applying different equations to a Pandas DataFrame

Question 1

I wrote a task using pandas and I'm wondering if the code can be optimized. The code pretty much goes as:

All the dataframes are 919 * 919. socio is a dataframe with many fields. matrices is a dictionary that holds different dataframes objects.

I apply a series of formulas and store the results in the result dataframe.

import numpy as np
frame = [zone for zone in range(taz_start_id, taz_end_id +1)] 
#frame = list from 1 to 919
result = DataFrame(0.0, index=frame, columns=frame)
#Get a seires type and apply the results to all the rows of the dataframe by index
temp = np.log(socio["ser_emp"] + 1) * 0.36310718
result = result.apply(lambda x: x + temp[x.index], axis = 0)
#Divide two columns apply a coefficient and fill all the nan to 0. apply results to result dataframe 
temp = (socio["hhpop"] / socio["acres"]) * -0.07379568
temp = temp.fillna(0)
result = result.apply(lambda x: x + temp[x.index], axis = 0) 
result = (matrices['avgtt'].transpose() * -0.05689183) + result 
# set a 1.5 value if dist is between values or 0 if not
result =((matrices["dist"] > 1) & (matrices["dist"] <= 2.5)) * 1.5 + result
# see if each cell is 0 if not set value to exp(value)
result = result.applymap(lambda x: 0 if x == 0 else exp(x))

Question 2

You don't want to use an awkward lambda to do the addition. You can use the add method with a parameter axis=0 that specifies which axis to match the lower dimensional argument.

I bundled the two Series you are adding into one so you only have to broadcast across the index once.

Next, I just changed to match my style and improve my readability

Finally, the finding where not zero and filling in with exp, I'm assuming exp is numpy's exp via some import from numpy import exp or something. Otherwise, replace with np.exp. This is much more vectorized. result is a dataframe and mask makes anything that evaluates to True from the passed condition statement into np.nan. I then chain a fillna with a vectorized call to np.exp(result)

result = result.add(
 np.log(socio.ser_emp.add(1)).mul(.36310718).add(
 socio.hhpop.div(socio.acres).mul(-.07379568).fillna(0)
 ), axis=0)
avgtt, dist = matrices['avgtt'], matrices['dist']
result += avgtt.T * -0.05689183 + ((dist > 1) & (dist <= 2.5)) * 1.5
result = result.mask(result != 0).fillna(exp(result))

Question 3

This is pretty helpful! ill go ahead and approach it the way you are. Thank you. Ill wait to see if there are other answers before accepting your answer.

Question 4

trying result = result.add(np.log(socio.basic_emp + 1).mul(0.24160834)) is returning the dataframe with all 0 and not with the updated value, any idea why?

Question 5

you don't have axis=0

Question 6

you are right thanks. this seems to be working now

piRSquared piRSquared 2261 silver badge3 bronze badges · Accepted Answer · 2017-01-05 23:05:49Z

You don't want to use an awkward lambda to do the addition. You can use the add method with a parameter axis=0 that specifies which axis to match the lower dimensional argument.

I bundled the two Series you are adding into one so you only have to broadcast across the index once.

Next, I just changed to match my style and improve my readability

Finally, the finding where not zero and filling in with exp, I'm assuming exp is numpy's exp via some import from numpy import exp or something. Otherwise, replace with np.exp. This is much more vectorized. result is a dataframe and mask makes anything that evaluates to True from the passed condition statement into np.nan. I then chain a fillna with a vectorized call to np.exp(result)

result = result.add(
 np.log(socio.ser_emp.add(1)).mul(.36310718).add(
 socio.hhpop.div(socio.acres).mul(-.07379568).fillna(0)
 ), axis=0)
avgtt, dist = matrices['avgtt'], matrices['dist']
result += avgtt.T * -0.05689183 + ((dist > 1) & (dist <= 2.5)) * 1.5
result = result.mask(result != 0).fillna(exp(result))

This is pretty helpful! ill go ahead and approach it the way you are. Thank you. Ill wait to see if there are other answers before accepting your answer.
trying result = result.add(np.log(socio.basic_emp + 1).mul(0.24160834)) is returning the dataframe with all 0 and not with the updated value, any idea why?

Stack Exchange Network

Applying different equations to a Pandas DataFrame

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Applying different equations to a Pandas DataFrame

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions