Python, use of lambda

Question 1

I have following code statement:

def gigajoule(row):
 row['Energy Supply'] *= 1000000
 return row
energy = energy.apply(gigajoule, axis = 1)

There probably should be a way to simplify by using a lambda function, but I can not figure out how to do that.

Question 2

Lambdas don’t (easily) allow you to change the input variable, but (assuming this is Pandas, in which case you should tag appropriately) it looks like that’s not a necessary side-effect because the row is replaced by the output anyway.

Question 3

In your example code, you're using df.apply differently from the normal usage pattern. The normal usage would be to generate a new row of values from the provided data without modifying the original data (see the warning about side-effects in the .apply() documentation). That is also the way lambda functions behave -- they generate new values via a one-line calculation, but can't do direct assignments. However, in your case, you are modifying the row you are given and then returning that.

Note that your code may be doing something quite different from what you expect. It does the following:

gigajoule receives a row from the dataframe
gigajoule modifies the row it received, possibly modifying the original dataframe itself
gigajoule returns the modified row
pandas assembles the rows returned by gigajoule into a new dataframe
You replace the existing dataframe with the new one.

Step 2 is pretty non-standard (modifying the original dataframe as a side-effect of an apply operation). For example, the following code changes the original energy frame, possibly unexpectedly:

import pandas as pd
energy = pd.DataFrame({'Energy Supply': [100, 200, 300], 'Temperature': [201, 202, 203]})
def gigajoule(row):
 row['Energy Supply'] *= 1000000
 return row
energy2 = energy.apply(gigajoule, axis = 1)
energy # has been modified!

You could use the same pattern with a lambda, like this, which also changes the original frame in a non-standard way:

import pandas as pd
energy = pd.DataFrame({'Energy Supply': [100, 200, 300], 'Temperature': [201, 202, 203]})
energy2 = energy.apply(
 lambda row: row.set_value('Energy Supply', row['Energy Supply']*1000000), 
 axis=1
)
energy # has been modified

You could avoid the non-standard side-effects on the original frame by using .copy(), like this:

import pandas as pd
energy = pd.DataFrame({'Energy Supply': [100, 200, 300], 'Temperature': [201, 202, 203]})
energy = energy.apply(
 lambda row: row.copy().set_value('Energy Supply', row['Energy Supply']*1000000), 
 axis=1
)

But since you're not actually trying to generate a new dataframe (i.e., you actually want to modify the existing dataframe), you could just do this instead, which would be the most standard way of using pandas:

import pandas as pd
energy = pd.DataFrame({'Energy Supply': [100, 200, 300], 'Temperature': [201, 202, 203]})
energy['Energy Supply'] *= 1000000
# or energy.loc[:, 'Energy Supply'] *= 1000000

This example also uses numpy to vectorize the calculation, so it should be much faster than the previous ones.

Question 4

The idea of lambdas is that they don't do "side effects", that is they just operate on the input parameters (check this answer for a more detailed answer)

So you could just return row with Energy Supply being multiplied by 1 million:

gigajoule = lambda row: dict([(k,v*1000000) if k=='Energy Supply' else (k,v) for k,v in row.items()])

And use it like this:

>>> row = {'something': 1, 'Energy Supply': 1}
>>> row = gigajoule(row)
>>> row
{'Energy Supply': 1000000, 'something': 1}

But really, your full fledged function works just fine and is much more readable that this thing

Question 5

There is a very simple way actually that does require lambda:

energy['Energy Supply'] *= 1000000

Matthias Fripp 18.9k5 gold badges37 silver badges49 bronze badges · Accepted Answer · 2017-12-26 20:26:56Z

In your example code, you're using df.apply differently from the normal usage pattern. The normal usage would be to generate a new row of values from the provided data without modifying the original data (see the warning about side-effects in the .apply() documentation). That is also the way lambda functions behave -- they generate new values via a one-line calculation, but can't do direct assignments. However, in your case, you are modifying the row you are given and then returning that.

Note that your code may be doing something quite different from what you expect. It does the following:

gigajoule receives a row from the dataframe
gigajoule modifies the row it received, possibly modifying the original dataframe itself
gigajoule returns the modified row
pandas assembles the rows returned by gigajoule into a new dataframe
You replace the existing dataframe with the new one.

Step 2 is pretty non-standard (modifying the original dataframe as a side-effect of an apply operation). For example, the following code changes the original energy frame, possibly unexpectedly:

import pandas as pd
energy = pd.DataFrame({'Energy Supply': [100, 200, 300], 'Temperature': [201, 202, 203]})
def gigajoule(row):
 row['Energy Supply'] *= 1000000
 return row
energy2 = energy.apply(gigajoule, axis = 1)
energy # has been modified!

You could use the same pattern with a lambda, like this, which also changes the original frame in a non-standard way:

import pandas as pd
energy = pd.DataFrame({'Energy Supply': [100, 200, 300], 'Temperature': [201, 202, 203]})
energy2 = energy.apply(
 lambda row: row.set_value('Energy Supply', row['Energy Supply']*1000000), 
 axis=1
)
energy # has been modified

You could avoid the non-standard side-effects on the original frame by using .copy(), like this:

import pandas as pd
energy = pd.DataFrame({'Energy Supply': [100, 200, 300], 'Temperature': [201, 202, 203]})
energy = energy.apply(
 lambda row: row.copy().set_value('Energy Supply', row['Energy Supply']*1000000), 
 axis=1
)

But since you're not actually trying to generate a new dataframe (i.e., you actually want to modify the existing dataframe), you could just do this instead, which would be the most standard way of using pandas:

import pandas as pd
energy = pd.DataFrame({'Energy Supply': [100, 200, 300], 'Temperature': [201, 202, 203]})
energy['Energy Supply'] *= 1000000
# or energy.loc[:, 'Energy Supply'] *= 1000000

This example also uses numpy to vectorize the calculation, so it should be much faster than the previous ones.

CollectivesTM on Stack Overflow

Python, use of lambda

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related