5
\$\begingroup\$

Background

I have tons of very large pandas DataFrames that need to be normalized with the following operation; log2(data) - mean(log2(data))

Example Data

The example DataFrame my_df looks like this;

 iovrrx nfinsu mvdfjc idjges fubmrg lvuhfv
0 0.987654 0.206104 0.802920 0.011157 0.860618 0.575871
1 0.706397 0.860083 0.939230 0.436194 0.557081 0.706964
2 0.043139 0.729435 0.597488 0.700998 0.974193 0.917758
3 0.316080 0.461547 0.844540 0.510143 0.908475 0.877330
4 0.828839 0.177670 0.610833 0.328238 0.327697 0.689756

Question

I have tried to perform the normalization operation noted above many different ways however the following code snippet is the only one that I have gotten to work;

log_div_ave = my_df.apply(np.log2).values.T - my_df.apply(np.log2).mean(axis=1).values
log_div_ave = pd.DataFrame(log_div_ave.T,columns=my_df.columns)
print(log_div_ave)
 iovrrx nfinsu mvdfjc idjges fubmrg lvuhfv
0 1.667378 -0.593258 1.368628 -4.800610 1.468744 0.889117
1 0.056992 0.340988 0.467991 -0.638518 -0.285601 0.058149
2 -3.467018 0.612699 0.324830 0.555330 1.030127 0.944032
3 -0.941776 -0.395590 0.476099 -0.251165 0.581380 0.531053
4 0.933714 -1.288174 0.493400 -0.402633 -0.405015 0.668708

As you can see I'm converting the DataFrame to a numpy array and transposing it just so I can subtract by the mean of the data. I then have to transpose the resulting array then reconstitute it as a DataFrame. Is there a simpler way to do all of this?

asked Feb 27, 2017 at 15:04
\$\endgroup\$

1 Answer 1

4
\$\begingroup\$

There's need to transpose. You can subtract along any axis you want on a DataFrame using its subtract method.

First, take the log base 2 of your dataframe, apply is fine but you can pass a DataFrame to numpy functions.

Store the log base 2 dataframe so you can use its subtract method. You can also reuse this dataframe when you take the mean of each row.

Finally subtract along the index axis for each column of the log2 dataframe, subtract the matching mean.

log2df = np.log2(my_df)
log2mean = log2df.mean(axis='columns')
log_div_ave = log2df.subtract(log2mean, axis='index')
James Draper
7463 gold badges7 silver badges22 bronze badges
answered Apr 17, 2017 at 5:21
\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.